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SUMMARY  OF  PRXRESS 


The  accoopllshments  of  e  first  year's  work  of  the  three  year 
program,  Computerized  Mapping  of  Disease  (MOD),  caa  be  summarized 
as  follows: 

(I)  The  broad  outlines  of  the  MOD  computerized  system 
have  been  clearly  defined, along  with  requirements 
of  equipment,  programs,  and  personnel. 


(2)  A  general  data-analyses  vocabulary,  a  detailed 
factor  catalog,  and  a  preliminary  data- 
extractlon/ collection  form  have  been  developed. 

(3)  A  large  file  of  disease  (leptospirosis  and 
hemorrhagic  fevers)  and  related  environmental 
data  has  bean  collected  for  use  In  the 
data-extraction/ processing  efforts. 


(A)  Prototype  disease-distribution  maps  have  been 

produced,  both  manually  and  by  a  computer/plotter 
system. 


*  *  *  *  * 


We  believe  that  the  project  Is  progressing  well  and  that  efforts  to 
cocq)lete  It  should  follow  the  reconmendatlons  Included  In  this  report. 
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INTRODUCTION 


During  the  period  IS  Novetaber  1965  •  14  Noveober  1966, 
the  Universities  Associated  for  Research  and  Education  in 
Pathology,  Inc.,  (UAREP)  and  The  Arroed  Forces  Institute  of 
Pathology  (AFIP)  coc^leted  their  first  year's  effort  on  the 
Mapping  of  Disease  (M5D)  project  entitled,  The  Geographic 
Distribution  of  Infectious  Diseases.  This  project  was 
developed  and  prograsmed  as  a  three>year  effort 
(Nov*  1965  -  Nov.  1968).  Consequently,  the  present  report, 
dealing  with  the  first  year' s  sccomplishments,  concentrates 
on  development  of  concepts,  methods  of  approach,  and  specific 
software/ hardware  requirements.  There  have  been  important 
"output"  achievements,  but  these  are  of  primary  interest 
because  they  represent  prototypes  rather  than  finished 
products. 

In  addition,  the  report  includes  detailed  plans  and 
recommendations  for  work  during  the  next  two  years  which 
should  lead  to  successful  completion  of  the  MOD  project. 
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OBJECTIVES,  BACHfJROUND,  AND  SCOPE  OP  THE  PROJECT 


The  Coo^uterized  Mapping  of  Disease  Project  has  two  principal 
objectives: 

The  ultimate  oblectlve  of  the  program  is  to  develop  research  techniques 
by  means  of  which  the  occurrence  of  a  particular  disease  may  be 
correlated  with  a  variety  of  sociological,  physical  and  environmental 
factors  such  as  population  density,  races,  ethnic  eroups,  altitude, 
teitq>erature,  humidity,  character  of  the  soil,  agrlouitural  products, 
possible  Insect  vectors  and  animal  reservoirs  of  disease,  to  give 
new  Insight  into  cause/effect  relationships  and  to  suggest  new  methods 
of  disease  control.  An  Important  potential  application  of  the 
teclmlque  Is  In  predicting  the  likelihood  that  a  given  disease  will 
dev^op  in  a  particular  area  under  specific  conditions  of  ecologlc 
change,  also  in  predicting  major  variations  in  prevalence,  e.g., 
anticipating  an  epidemic. 

The  immediate  oblectlve  of  the  program  is  to  provide  data  in  the  form 
of  disease  distrli^utlon  maps  and  atlases,  showing  prevalence,  Incidence, 
and  severity  of  specific  Infectious  diseases  throughout  the  world 
along  with  the  distribution  of  actual  and  potential  causally  related 
factors. 

By  using  a  computerized  system  of  analysis  and  output,  it  will  be 
posclble  to  produce  distribution  maps  in  a  matter  of  minutes  rather 
than  months,  as  has  previously  been  the  case.  This  will  allow 
up-dating  whenever  required.  Furthermore,  such  a  system  will  permit 
the  production  of  many  more  maps  than  would  otherwise  be  practical, 
covering  a  wide  range  of  ecologlc  factors.  As  desired,  these  could  be 
printed  on  transparent  stock  suitable  for  overlay  assembly  in  order  to 
compiare  one  pattern  of  distribution  with  another,  etc. 

Data  on  the  geographic  distribution  of  infectious  disease  are  of 
obvious  Importance  In  evaluating  the  disease  risk  for  groups  of  persons 
assigned  to  foreign  posts  and  In  any  detailed  planning  that  Involves 
the  socio-economic  problems  of  a  particular  area.  There  have  been 
only  two  major  contributions  in  this  field  and  these  are  now  seriously 
out  dated.  They  are:  (a)  Geographic  Atlas  of  Disease,  prepared  by 
the  American  Geographical  Soclty,  published  during  1950-55.  (b)  World- 

Atlas  of  Epidemic  Diseases,  edited  by  Professor  Ernst  Rodenwaldt 
(Heidelberg),  published  In  1952  but  reflecting  data  gathered  years 
before.  Neither  of  these  efforts  Involved  modem  data  storage/rctrleval/ 
processing  methods,  nor  did  either  of  them  attempt  to  relate  ecologlc 
factors  to  disease  in  a  manner  that  would  allow  detailed  cause/ effect 
analysis. 
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The  Division  of  Geographic  Pathology  and  the  Registry  of  Geographic 
Pathology,  both  of  the  /jnned  Forces  Institute  of  Pathology  have  an 
intense  interest  In  the  geographic  distribution  and  manifestations  of 
disease  and  have  had  much  experience  in  this  field.  Until  I  April  1963, 
the  National  Academy  of  Sciences  -  National  Research  Council  was 
responsible  for  administrative  and  fiscal  matters  pertaining  to  the 
conduct  of  the  American  Registry  of  Pathology.  Although  these 
responsibilities  have  been  transferred  to  a  non-profit  organization 
known  as  Universities  Associated  for  Research  and  Education  in 
Pathology,  Inc.,  the  Chief  of  the  Division  of  Geographic  Pathology  at 
the  Institute  (H.C.II, ,  Associate  Scientist  of  the  HOD  Project) 
continues  to  act  as  Registrar  of  the  Registry  of  Geographic  Pathology 
of  the  American  Registry  of  Pathology.  This  allows  access  to  a  great 
deal  of  disease  information  from  world-wide  sources.  The  Division  has 
also  had  considerable  experience  in  gathering,  storing,  and  retrieving 
medical  data. 


The  MOD  Project  represents  the  first  serious  effort  (to  our  knowledge) 
to  develop  a  computerized  disease-mapping  system  coupled  with  a 
comprehensive  data- file  of  ecologlc  factors.  If  successful,  the 
system  will  provide  an  important  research  tool  to  determine  complex 
cause/effect  relationships  among  disease  and  environmental  factors. 


The  kinds  of  relationships  between  disease  and  environmental  factors 
with  which  this  project  deals:  causal,  associative,  or  accidental, 
are  shown  in  figure  1. 


Et 

- *  Dls 

Et 

- -  Et' 

1 

Dls 

^^>Et 

Et 

j 

“^Dis 

Et 

-  Dls 

((Et 

«  ecologlc 

Direct  cause/effect  relationship 
Indirect  cause/effect  relationship 


Constant  association 
of  Et/Dls  b<;t:ause  of 
an  underlying  COI-iiroN  cause 

Accidental  (inconstant)  association 

factor;  Dis  =*  disease)) 


FIGURE  I, 


8— 


There  are  three  basic  parts  to  the  t'£0  Project  (system)  and  these 
Intimately  related  to  each  other  In  the  sequence  shown  below. 


Locate  and  get  data  sources 


r 


Select/ extract/ format  to  produce  a  Data  File  Base 


Design/ Implement  Software/hardware  system 


(giving  special  considera¬ 
tion  to  form  of  output) 


FIGURE  2 


The  Data  File  Base  Is,  obviously,  an  essential  (key)  Ingredient  of 
the  system  since  It  provides  the  substance  upon  which  the 
software/hardware  cotziponents  act.  We  realize  full  well  the 
difficulties  In  getting  adequate  (comprehensive  and  reliable)  basic 
data  and  know  that  we  can  never  achieve  perfection  here.  However, 
our  own  knowledge  and  experience,  supplemented  by  Information 
obtained  from  personal  contacts  with  many  experts  throughout  the 
world,  coupled  with  that  information  to  be  found  in  scientific  papers 
and  reports  will,  we  believe,  give  us  the  most  effective  data  base 
which  has  yet  been  elaborated. 

A  fundamental  and  essential  consideration  In  developing  the  WD  system 
is  the  conversion  of  narrative  or  tabular  data  to  a  form  In  which  It 
can  be  computer  processed.  This  requires  rigid  specification  of  the 
form  In  which  the  data  Is  to  be  Input.  Not  only  must  the  data  be 
processable.  It  must  be  mappable.  Further,  the  data  must  be 
selected/ extracted  so  that  It  is  relevant  to  and  significant  for  the 
desired  output. 

Compromise  Is  Inevitable  In  selecting  the  form  of  the  Input.  A  natural 
or  problem  oriented  language  would  be  easier  for  the  data  processor  to 
use  whereas  a  fixed- form  Input  format  would  (probably)  be  easier  for 
the  computer  to  handle.  A  proper  compromise  Is  one  In  which  the  kind 
of  language  Input  foimat  developed  best  suits  the  total  procedure, 
/arriving  at  this  proper  compromise  represents  a  critical  step  since  an 
Inappropriate  selection  would  lead  to  much  delay  and  costly  duplication 
of  effort. 
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One  of  the  most  pressing  problems  to  be  solved  at  this  time  concerns 
Input  and  has  to  do  with  quantitative  aspects:  the  measure  of  the 
disease  In  relation  to  the  size  and  selectivity  of  the  population 
sample.  There  are  many  other  essential  factors,  of  course,  but  the 
measure  of  disease  Is  pr^ry. 

Assuming  that  an  operable  data  processing  system  Is  developed  as  a 
result  of  our  efforts,  the  trus  evaluation  of  Its  potential  as  a 
technique  will  be  dependent  upon  and  limited  by  the  quality  of  the 
data  which  la  Input  for  processing.  Furthermore,  achievement  of 
our  immediate  objective,  "mapping  of  disease".  Is  as  dependent  upon 
the  data  as  It  Is  upon  the  processing  system. 

We  enqphaslze  the  Importance  and  difficulty  of  the  research  effort 
necessary  to  select/extract/format  "raw"  data  In  order  that  It  can  be 
computer  processed  and  output  In  the  form  of  distribution  maps. 
Without  adequate  and  properly  formated  data,  significant  output  Is 
Impossible.  The  old  term  GIGO  expresses  the  situation  well: 
garbage  In/garbage  out. 
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This  aspect  of  the  problen  will  be  considered  in  detail  under 
"Data-oanagement  Considerations* 

*  a  a  a  a  *  * 

In  order  to  accotspllsh  the  goals  of  MOD  It  was  obviously 
necessary  that  we  restrict  our  efforts,  and  we  have  assumed  certain 
self-imposed  limitations: 

(1)  Of  many  many  possible  diseases  for  study  we  selected 
two:  leptospirosis  and  the  hemorrhagic  fevers.  These 
were  carefully  chosen  for  several  reasons:  (a)  they 
are  important  diseases;  (b)  we  (A.F.I.P.)  know  a  good 
deal  about  them;  (c)  hlgh-reliabllity  laboratory 
diagnosis  Is  possible;  (d)  they  are  wide-spread  In 
distribution,  but  not  completely  diffuse;  (e)  more 
reliable  distribution  maps  are  badly  needed;  (f)  each 
of  the  two  diseases  poses  specific  data  processing 
challenges  in  relation  to  Important  ecologlc  factors, 
examples  of  which  include  — 

Leptospirosis:  Involves  many  mammalian  reservoirs, 
both  domestic  and  wild* 

Is  greatly  Influenced  by  the  amount  and 
nature  of  surface  water.  Including  pH, 
mineral  content,  rate  of  evaporation,  etc* 

Prevalence  Is  greatly  Influenced  by 
occupational  and/or  recreational  habits  of 
human  beings* 

Severity  varies  markedly,  depending  upon 
serotype  (and  many  other  factors)* 

Hetsorrhaglc 

Fevers:  Are  often  (some  types)  sharply  limited 

by  highly  restrictive  ecologlc  factors* 

Are  arthropod  borne  for  the  most  part 

^ianlfestatlons  are  greatly  Influenced  by 
age  and  by  race  and  by  the  specific 
causal  virus* 
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(2)  Of  the  enortnouB  nuaber  of  ecologlc  factors  which  could 
be  studied,  we  are  limiting  our  studies  (data  collection)  to  the 
more  current  reports  (where  feasible)  and  to  the  kind  of  data 
%ihich  experience  and  reason  indicates  will  probably  be  most 
significant.  Obviously,  data  collection/ analysis  can  be  extended 
into  new  areas  or  different  time  periods  should  the  need  arise. 

In  general,  four  different  types  of  data  sources  are 
available  to  the  iiOD  Project: 

1.  Published  prose  summaries:  monographs,  books. 

Journals,  technicial  notes. 

2.  Unpublished  prose  sunrarles:  progress  reports, 
laboratory  reports,  letters,  oral  cooiminlcations. 

3.  Unpublished  raw  data:  IBM  cards,  field  notes, 
various  cotq>leted  data-collectlon  forms  filled  out 
by  other  (non-AFIP-UAREP)  organizations. 

4.  Published  and  unpublished  maps. 

These  represent  an  extremely  large  quantity  of  potentially 
useful  data  -  much  more  than  we  can  hope  to  assimilate.  We  believe 
that  we  have  been  realistic  in  limiting  our  data  collection  to 
that  which  seems  most  pertinent,  concentrating  on  the  most  recent. 

We  have  not  forgotten  that  the  primary  goal  of  the  MOD  Project  is 
the  development  of  a  computerized  disease-mapping  system  rather  than 
a  comprehensive  collection  of  data.  But  the  system  must  have 
substance  to  work  upon.  The  computer  processing  activity  is  but 
one  side  of  the  coin;  an  adequate  data  file  base  is  the  other. 

(3)  The  geographic  areas  selected  for  mapping  have  been 
(tentatively)  limited  to  three  distinctly  different  scales. 

(a)  the  world,  per  se,  -  presenting  a  small-scale 
over-all  view; 

(b)  Thailand  -  presenting  a  medium-scale  view;  and 

(c)  a  portion  of  the  "Quadrl- county"  area  of 

Southern  Illinois  -  presenting  a  large-scale  defiled 
view. 

This  latter  region  was  chosen  because  of  the  Intensive  ecologic  and 
zoonotic  studies  which  have  been  going  on  there  for  the  past  several 
years  aa  part  o£  a  major  iut.»a<i®y«rtmental  research  program  of  the 
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University  of  Illinois*  Mot  only  will  e  great  deal  of  detailed 
data  (ouch  of  it  unpublished)  be  available  to  us,  but  several  of 
the  key  scientists  working  In  the  area  have  expressed  their 
willingness  to  generate  specific  data  If  necessary  to  "fill  out" 
certain  Information  areas  of  our  study. 

Although  we  plan  to  limit  our  output  maps  to  three  size- scales, 
all  of  the  data  ^ich  we  accunmlate  for  computer  retrieval/processing 
will  have  geographic  factors  specified  In  sufficient  detail  that 
they  can  be  used  In  the  production  of  a  wide  range  of  map  scales. 

A  more  detailed  discussion  of  this  aspect  of  the  problem  Is  Included 
In  "Data-processlng  Considerations". 

(4)  Many  kinds  of  map  pro  lections  could  be  used.  We  have 
selected  an  equirectangular  projection  because  it  will  provide 
econonv  of  computer  time/effort.  Although  this  kind  of  projection 
leads  to  much  distortion  in  the  polar  areas,  distortion  is  at  a 
minimum  In  the  tropical  zone,  the  region  of  greater  interest. 

Once  the  data  are  processed  for  plotting  subprograms  can  be 
Introduced  so  that  the  data  could  be  plotted  in  accordance  with 
virtually  any  type  of  projection  desired,  e.g.  liercator's, 

Goode's  homolosine  equal  area.  Azimuthal  equidistant,  etc. 

It  Is  appropriate  once  again  to  emphasize  that  the  major 
objective  of  the  i’iOD  Project  is  to  develop  a  system  whereby 
narrative  and  tabular  data  can  be  collected  and  preprocessed 
(formulated) to  a  form  suitable  for  subsequent  computer  processing 
and  output  in  the  form  of  distribution  maps,  graphs  (e.g.,  the 
n-factorial  three  dimensional  representation  Illustrated  in 
figure  11),  tables,  and  narrative.  Although  the  self-imposed 
limitations  described  above  narrow  the  limits  of  output  we  will 
seek,  they  do  not  narrow  the  potential  limits  of  the  system.  The 
system  is  being  designed  to  meet  certain  needs  for  information 
dealing  with  infectious  disease,  however,  the  same  system  could  be 
used,  with  little  modification,  to  analyze  the  ecologic  factors  which 
influence  efficient  stockpiling  of  corn  or  aluminium,  or  the  ecologic 
factors  which  influence  efficient  forest  preservation  or  development 
of  recreational  facilities,  or  the  ecologic  factors  which  Influence 
efficient  development  and  location  of  community  blood  banks  or 
liedlcara  treatment  Centers,  etc.  etc. 

****** 

We  have  divided  the  1K)D  Project  Into  six  successive  phases, 
related  to  the  kind  of  effort  required.  Figure  four  shows  these 
phases,  the  different  tasks  included  in  each,  and  the  time-effort- 
personnel  which  each  will  require.  A  detailed  consideration  of  the 
specific  tasks  comes  later,  however  the  major  phases  can  be 
described  here. 
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FIGURE  3.  -  Schedule  of  major  phases  and  tasks  comprising 
1®D  Project,  Nov  1965  -  Nov  1968,  The  chart  indicates 
vhat  has  been  carried  out  or  is  currently  underway 
(Initial  Orientation,  System  Analysis,  and  Data  Analysis 
Phases).  It  shows  also  what  has  been  tentatively  planned 
in  order  to  bring  the  IX)D  Project  to  a  successful 
conclusion  within  the  three  year  time  limit  (System  Design, 
System  Implementation,  and  ^stem  Development  Testing 
Phases) . 


Initially,  we  envisioned  the  liDD  Project  to  be  divided  Into 
three  simple  phases:  (1)  System  Analysis,  (2)  System  Design,  and 
(3)  System  Implementation.  However,  as  we  grew  more  knowledgeable. 
It  became  evident  that  this  division  was  an  over-simplification. 

The  six  successive  stages  which  we  have  settled  on  are: 

Stage  one,  the  Initial  Orientation  Phase,  a  time  during 
which  we  created  a  large  edge-notched  file  system  covering  many 
aspects  of  leptospirosis  and  associated  ecologlc  factors.  In 
addition  we  examined  Intensively  the  published  literature  deallag 
with  Information  technology,  computers  and  computer  processing  and 
automated  mapping  methods  and  had  detailed  discussions  about  these 
matters  with  many  groups. 

Stage  two.  System  Analysis,  was  effectively  performed  with 
the  essential  help  of  personell  of  the  Planning  Research  Corporation 
(F.R.C.),  especially  Dr.  Jerome  Morenoff,  Senior  Associate,  and 
Joseph  L.  Ferguson,  Associate  ,  through  subcontract,  UAREP  66.1. 

Work  done  during  Stage  two  brought  Into  clear  focus  the  importance, 
complexity,  and  variety  of  the  problems  concerned  with  analyzing 
and  preprocessing  disease/ environmental  data  so  that  it  could  be 
effectively  computer  processed.  It  was  this  Insight  which  led  to 
Stage  three. 

Stage  three,  an  Interim  phase  (between  the  initially 
visualized  Phases  I  and  II),  Is  concentrating  on  data  analysis. 

I'iany  of  the  problems  that  we  are  meeting  In  this  area  have  not  been 
effectively  dealt  with  before,  and  more  research  effort  is,  of 
necessity,  going  Into  this  aspect  of  the  study  than  originally 
planned.  Obviously,  this  is  one  of  the  basic  factors  in  developing 
an  effective  program.  To  solve  these  data  analysis  problems,  one 
Is  required,  among  other  things,  to  define  terms  very  specifically, 
in  the  course  of  which  it  is  necessary  to  develop  a  glossary  or 
data  vocabulary.  One  of  the  reasons  why  this  Is  such  a  difficult 
task  Is  that  the  terms  which  we  must  consider  are  derived  from 
many  different  disciplines,  and  the  same  word  often  has  a  signifi¬ 
cantly  different  meaning  when  used  by  the  geographer,  the  cultural 
anthropologist,  the  political  economist,  the  epidemiologist,  etc. 


*  Subsequently,  Wayne  L.  Richmond  replaced  Joseph  Ferguson. 
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Furthermore,  our  work  on  data  analysis  has  forced  a  detailed 
evaluation  of  the  essential  features  required  of  data  In  order 
that  they  can  be  meaningfully  mapped.  This  led  to  the  development 
of  a  data  structure  system  to  handle  effectively  various  orders  of 
qualitative/ quantitative  descriptors. 

Again,  we  called  on  FRC  for  professional  data-processlng 
assistance,  and  this  was  accomplished  through  subcontract 
UAIIEF  Gd.2.  Currently,  we  are  In  this  phase  of  the  program.  A 
principal  PIIC  consultant  during  the  previous  contract  (Ferguson) 
was  replaced  by  iir.  Wayne  L«  Richmond  who,  together  with 
Dr.  bbrenoff  (also  of  PRC),  and  IJr.  Harry  Kline  (a  programer  with 
systems  analysis  capabilities,  employed  directly  by  UAREF)  form  a 
very  effective  and  highly  productive  computer  oriented  team. 

A  detailed  description  of  the  data  structure  system,  a  list 
of  many  dlsease/environmcntal  factors,  and  preliminary  forms  for 
data  extraction/recording  Is  presented  In  "Data-management 
Considerations" . 

After  the  Data  Analysis  Phase  Is  completed  (January  1937), 
we  expect  to  know  enough  about  the  characteristics  of  disease/ 
environmental  data  that  we  can  proceed  with  developing  a  computer 
system  which  will  process  these  data  effectively. 

Stage  four,  the  System  Design  Phase  (our  Initial  Phase  two). 

Is  the  period  during  which  detailed  plans  for  the  system  will  be 
made,  the  exact  equipment  needs  determined,  and  the  programs 
necessary  to  accomplish  the  data-processlng  outlined.  We  expect  to 
continue  our  fruitful  collaboration  with  PRC  during  this  and  the 
remaining  phases. 

Stgp.e  five,  the  System  Implementation  Phase  (our  Initial 
Phase  three) ,  should  begin  during  the  mid-part  of  our  second  year 
and  continue  well  Into  the  third  year  of  the  iX)D  Project.  During 
this  time,  equipment  will  be  obtained  and  made  operational,  and  the 
many  required  computer  programs  will  be  written. 

Stage  six,  the  System  Development  'testing  Phase,  is  expected 
to  occupy  the  last  months  of  the  three  year  iiOD  Project.  The 
various  programs  of  the  LOD  computer  system  will  be  Integrated  and 
tested  with  significant  volumes  of  actual  disease/ environmental 
data  to  produce  distribution  maps  and  other  forms  of  print-out. 
Numerous  minor  errors/defects  will  become  apparent  and  require 
correction  as  we  demonstrate  and  evaluate  the  system.  We  also 
anticipate  the  need  (desirability)  to  add  new  features  and 
capabilities  during  this  last  phase  of  the  project  as  the  potential 
of  the  system  becomes  more  clearly  evident,  and  as  the  answers  to 
some  questions  Indicate  the  desirability  of  answers  to  new  kinds  of 
questions.  If  all  goes  well.  Phase  six  will  culminate  In  a  fully 
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operational  system  consisting  of  turn  major  components: 


First,  a  sub-system  for  the  selection/ extraction/ 
preprocessing  of  "raw*'  narrative,  tabular,  and  graphic 
data  necessary  to  produce  a  data  file  base. 

Second,  a  sub-system  of  software-hardware  which  will 
Incorporate  the  following  Into  an  Integrated  functional 
unit,  data  file  base/edltor  and  file  maintenance 
programs/retrieval  mechanlsm/proces^-or/report  generatpi; 
(including  map  plotting). 

After  the  liOD  system  Is  developed  and  operational,  we 
anticipate  that  its  many  potential  applications,  medical  and  other¬ 
wise,  will  be  readily  apparent  and  fully  exploited.  We  anticipate 
also  that  a  number  of  the  disease/ environmental  distribution  maps 
produced  by  the  system  will  be  suitably  reproduced  for  publication 
and  wide  distribution. 


it  if  It  it  it  it 

tost  of  the  work  of  the  Mapping  of  Disease  Project  Is  being 
carried  out  at  the  Annex  of  the  Armed  Forces  In8tl:ute  of 
Pathology ,( 7th  and  Independence  Avenue,  S.W,,  Wash;.ngton,  D,  C,). 

An  area  of  approximately  650  square  feet  has  been  assigned  for  use 
by  the  Project  (not  Including  space  to  be  occupied  by  card  punch 
equipment,  sorters,  the  computer,  etc.).  Internal  organization  of 
the  t'SQD  Project  and  personnel  involved  In  work  during  the  past 
year  are  shown  in  Fig,  three.  Obviously,  additional  personnel  will 
be  required  to  complete  the  iiOD  Project  within  the  time  allotted 
and  Fig,  five  presents  the  projected  Internal  organization/personnel 
requirements.  Specific  functions  visualized  for  these  new  persons 
will  be  discussed  under  "Data-processlng  Considerations", 
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DATA  -  MANAGEMENT  CONSIDERATIONS 


An  introductory  discussion  of  the  general  nature  and  Importat 
of  the  data-management  aspects  of  NOD  are  given  on  page: 9. 

With  our  limited  facilities,  it  la  manifestly  impossible  for 
US  to  generate  even  a  significant  part  of  the  data  which  we  wish 
to  process,  and  we  have  fixed  on  literature- search,,  compiled  with 
Information  collected  by  word  of  mouth,  letter,  and  "private" 
(unpublished)  report  as  our  primary  sources  of  Information. 

LTC  Watson  (D.V.M,),  has  concentrated  on  this,  assisted  by 
SSgt*  Thomas  H,  Morgan,  l-frs.  Chu,  and  Mrs.  Eisenberg.  Over  3,500 
selected  references  dealing  with  leptospirosis  have  been  collecte< 
approximately  1800  of  which  have  been  abstracted. 

I 

In  addition.  Important  continuing  contacts  have  been  made  wil 
eminent  leptospiral  research  workers.  Including  Dr.A,D.Alexander 
(WRAIR),  Mrs.  Mildred  M,  Galton  (CDC),  and  Dr.  Lyle  Hanson  (Ill. 
Center  for  Zoonoses  Research,  U.  of  Ill.).  These  persons  have 
been  and  will  continue  to  be  Important  sources  of  unpublished 
data  and  will  continue  to  be  very  helpful  in  their  critical  analy* 
of  our  work. 

Data  collection  dealing  with  the  hemorrhagic  fevers  is 
inmediately  available  to  us  through  the  files  of  LTC  Proctor  Chile 
of  toe  Geographic  Pathology  Division.  Other  important  sources  of 
compiled  information  readily  available  to  us  include  MARU 
(Dr.  Karl  Johnson),  the  U.  S.  Component  SEATO  Laboratories  at 
Bangkok  (Col  James  L.  Hansen,  liC,  USA  former  Director, 

Dr.  Sylvanus  W,  Nye,  formerly  assigned  there,  and  Capt.  Will 
Blackburn,  presently  assigned  there,  from  the  Geographic  Division, 
AFIP),  the  San  Lazarus  Hospital,  Manila  (Dr.  Reyes),  and  the  5th 
Epi  Flight,  Clarke  AFB  (Col  Kremers).  I 

As  our  coverage  of  the  disease-oriented  literature  becomes  me 
nearly  complete,  we  will  be  able  to  devote  progressively  more  ere.t 
to  the  collection  of  data  concerning  those  environmental  factor 
which  are  likely  to  influence  the  occurrence  or  manifestations  of 
the  diseases  under  study.  To  this  end  we  have  begun  a  file  of 
references  to  published  maps  which  present  the  distribution  of 
various  environmental  factors.  We  are  being  assisted  in  this  effc 
by  our  consultant  in  medical  geography.  Dr.  Warwick  Armstrong,  enc 
a  graduate  student  assistacL,  both  of  the  Departnent  of  Geo3rii.‘by' 
University  of  Illinois.  In  addition,  we  have  explored  the 
possibility  of  subcontracts  to;  the  Biological  Sciences 
Conxounication  Project  (of  George  Washington  University),  the 
EioSciences  Information  Service  of  Biological  Abstracts,  and  the 
American  Geographic  Society,  in  order  to  expand  our  data  collect ic 
capability. 


17- 


Once  the  data  sources  have  been  collected  and  organized,  the 
task  of  extracting  the  relevant  data  from  the  sources  begins* 

When  the  i-X)D  Project  began,  we  did  not  realize  that  this  would 
pose  unusual  problems*  However,  the  further  our  work  progressed, 
the  more  apparent  It  became  that  data  extraction/preprocessing 
presented  some  very  serious  problems,  far  more  complex  than  could 
have  been  anticipated  In  the  beginning*  Some  of  these  problems 
have  been  encountered  by  other  groups  (we  learned)  but,  often, 
they  were  avoided  rather  than  solved,  and.  In  some  Instances,  this 
actually  prevented  the  effective  use  of  much  potentially  available 
data*  Some  of  the  lioportant  problems  In  this  area  have  not  been 
encountered  by  other  groups,  so  far  as  we  can  determine,  probably 
because  no  one  else  has  atten^ted  to  develop  the  kind  of  system 
which  i-iOD  represents* 

Our  basic  approach  to  solving  these  data-extractlon  problems 
has  been  through  a  group  effort.  Involving  both  data-processlng 
and  data- collecting  personnel*  Attempts  to  extract  and  put  Into 
consistent  form  tho  data  on  disease  and  environmental  factors 
contained  In  selected  representative  data  sources  were  continued 
until  the  data  was  extracted  meaningfully  mappable  and  in  a  form 
acceptable  to  the  data  processors  as  well  as  the  data 
collectors/ analysts.  General  requirements  for  data  content/ format 
were  formulated  as  an  Indirect  result  of  these  efforts,  and  this 
Is  one  of  our  most  important  accomplishments  to  date  (largely  due 
to  the  efforts  of  Dr** Guffey)* 

The  first  major  problem  encountered  was  that  no  generic  terms 
existed  which  encompassed  disease/environmental  data*  Thus  It 
became  necessary  to  construct  a  general  data-analysls  vocabulary 
before  we  could  communicate  effectively  In  relation  to  the 
disease/ environmental  data  which  we  were  attempting  to  extract* 

This  data-analysls  vocabulary  includes  definitions  for  and 
discussion  of  the  interrelationships  among  such  vitally  important 
terms  as  "factor",  "common  elements",  "value",  "data  point",  "map", 
and  "narrative".  The  data-analysls  vocabulary,  in  Its  present 
(preliminary)  form.  Is  Included  as  Appendix  two* 

The  second  major  problem  Involved  specifying  rrecisely  what 
Items  of  disease/environmental  information  were  pertinent  to  our 
major  objective:  the  production  of  distribution  maps*  This  led 
us  to  develop  a  catalog  of  disease/environmental  factors  which 
could  be  used  by  the  liOD  computer  system  In  producing  disease/ 
environmental  factor  -  distribution  maps*  This  factor  catalog.  In 
Its  preliminary  form,  is  Included  as  Appendix  three* 

We  have  found  that  many  of  the  data  avail rl-le  for  prorcri'ing 
are  incomplete  in  one  way  or  another  and,  oltcn,  prof'?osicii£'l 
judgement/ Interpretation  (sometimes  extrapolation)  must  be  carried 
O'ib  If  the  data  Is  to  be  usable.  Narrative  print-out,  to 
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accompany  the  computer  maps,  will  note  these  interpretations  and 
source  document  numbers  will  be  available  upon  request  should 
the  user  wish  to  consult  the  data  report  from  which  interpretations 
were  made.  Some  of  the  data  will  be  of  very  limited  use  because 
essential  factors  (v;hich  must  have  been  known  to  the  author)  simply 
aren't  recorded.  These  problems  are  numerous  and  serious,  but  our 
objective  is  to  do  the  best  we  can  with  the  infoinnatlon  available. 
An  important  by  product  of  the  IX)D  system  will  be  to  emphasize  the 
factors  which  must  be  specified  in  order  for  data  reports  to  be 
most  useful.  An  example  of  the  kinds  of  conclusion- statements  that 
are  frequently  found  (and  often  very  Important)  is  given  below. 

The  limitations  of  data  given  in  the  apparently  (at  first  glance) 
simple  straightforward  statement  are  indicated  by  the  questions 
that  need  answers. 


"Four  percent  of  cattle  in  Southern  Illinois 
have  leptospirosis". 

1.  Over  what  time  period  were  the  data  collected? 

2.  IJhen  were  the  data  collected? 

3.  If  this  is  a  conclusion  from  a  composite  of  different 

studies  are  we  sure  that  there  is  no  overlapping? 

4.  What  was  the  size  of  the  sample(s)V 

5.  IThat  are  cattle?  (all  bovids?  a  limited  number  of 

species  of  bovids?  a  limited  number  of  breeds 
within  one  species?  just  cows?  Just  mature 
animals?  etc.?) 

6.  What  was  the  nature  of  the  sample(s)  of  "cattle" i 

-  sick  cattle? 

-  cattle  selected  because  of  the  State  Health 

Department's  interest  in  certain  regions? 

-  cattle  selected  because  of  University  studies 

being  carried  out  at  specific  chosen 
(cooperative)  farms ^ 

7.  Is  it  likely  that  the  prevalence  was  uniform  throughout 

Southern  Illinois? 

8.  V/hat  are  the  precise  geographic  limits  of  "Southern 

Illinois"? 

9.  V/hat  is  "leptospirosis"? 

-  disease  in  terms  of  clinical  illness? 

-  detectable  antibodies? 

-  recoverable  organisms? 

10.  What  was  the  inherent  accuracy  of  the  diagnostic  procedure(s)? 

11.  I/hat  was  the  inherent  sensitivity  of  the  diagnostic 

procedure (s)i 

Continued  - 
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12,  How  reliable  was  the  laboratory  which  performed  the 

analyses? 

13,  Were  the  samples  for  analyses  entirely  adequate? 

14,  Is  this  report  completely  honest(i.e,  is  there  intent 

to  mislead) : 

15,  Were  the  studies  which  led  to  this  conclusion  well 

planned  (i.e.,  was  the  experimental  design  good)? 

16,  If  this  report  is  a  suinnary-analysis  of  a  collection, is  it  correct? 

(l.e.,  was  there  an  error  in  transcription  or  in 

mathematical  manipulation  of  data)/ 

In  retrospect,  it  seems  that  there  were  two  reasons  why  it  was 
necessary  to  develop  these  two  conceptually  important  documents: 

First,  no  attempts  nearly  so  comprehensive  as  ours  have  ever  been 
made  (or  at  least  reported)  to  convert  highly  varying,  narrative 
data  to  a  consistent,  geographically  oriented  (and  therefore 
mappable)  format.  Previous  disease-distribution  maps  have  merely 
Indicated  presence  or  absence  of  particular  diseases  at  particular 
times  in  particular  locations,  IThat  we  hope  to  do  is  much  more 
Involved  than  that. 

Second,  many  different  academic  disciplines  have  been 
independently  interested  in  various  aspects  of  diseases  or  environ¬ 
ments  for  a  number  of  years.  As  a  result,  many  of  the  terms  which 
the  different  disciplines  use  in  discussing  specific  data  have  quite 
different  contexts.  These  have  never  been  adequately  correlated  or 
even  precisely  enough  defined  for  use  in  computer  processing. 
Epidemiologists,  agronomists,  geodologists,  and  geologists,  each  with 
their  own  bias,  all  touch  upon  various  characteristics  of  the  soil. 
Epidemiologists,  geochemists,  llmnclogists,  geomorphologists, 
ecologists,  and  recreation-oriented  sociologists  and  anthropologists 
all  discuss  surface  waters  (lakes  and  streams)  from  various  points 
of  view.  However,  in  many  instances,  these  different  workers  do  not 
use  mutually  intelligible  terms.  The  factor  catalog  which  we  have 
developed  represents  an  attempt  to  overcome  these  communications  gaps. 

Another  difficult  problem  involves  assessment  of  data 
reliability.  Obviously  this  requires  professional  judgement,  bfsed 
upon  a  variety  of  factors  not  susceptible  to  rigid  formulation.  It 
has  proved  unrealistic  to  break  this  factor  do^m  into  more  than 
three  categories:  "more  reliable",  "less  reliable"  and  "unimown'' , 

Even  so,  such  a  limited  classification  will  allow  useful  separation 
of  "good"  data  from  questionable  data.  For  exan^le,  data  of  a 
specific  kind  could  be  separated  into  three  categories  of  reliability 
and  each  category  plotted  (mapped)  on  a  separate  transparent  overlay 
sheet.  Then,  by  proper  arrangement  of  the  sheets,  the  different 
categories  could  be  viewed  individually  or  together  in  any  combination. 
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Turning  to  the  mechanics  of  data  extraction,  the  development 
of  suitable  data-extractlon  forms  on  which  pertinent  data  can  be 
effectively  recorded  has  proved  to  be  a  very  difficult  task  because 
of  the  highly  varying  content  of  the  data  sources,  coupled  with 
the  requirement  that  the  data  must  relate  to  a  consistent, 
geographically-oriented  format.  Our  tentative  scheme  for  handling 
these  matters  (of  Fig,  5)  Involves  a  three-stage  process: 

(1)  Data  extractors  (necessarily  with  biomedical  background  since 
value  Judgements  are  required)  will  fill  in  relatively  simple 
data-extraction  forms.  These  forms  will  be  submitted  to  a  data- 
analyst(s),  who  (with  the  help  of  data-consultants  as  necessary) 
will  check  (edit)  the  forms;  (2)  transcribe  the  data  from  them  to 
a  more  rigidly  formatted  form.  These  latter  forms  will  then  be 
(3)  converted  to  punched  cards  for  input  into  the  computer  system, 

V/e  have  developed  a  series  of  data-extraction  forms  during  the  past 
year,  each  new  form  being  an  Improvement  over  the  previous  one, 
incorporating  refinements  suggested  by  the  data  collectors.  We  are 
now  using  a  data  form  designed  for  the  particular  purpose  of 
collecting  leptospirosis  data  in  Southern  Illinois,  Only  disease 
data  HOFs  (lilddle  Order  Factors  -  see  Appendix  2)  appear  on  this 
form  (Fig,  6);  other  foims  will  Include  spaces  for  both  disease  and 
environmental  data.  However,  this  form  (Fig,  6)  is  allowing  us  to 
accumulate  a  significant  amount  of  consistently- formatted  disease 
data  for  the  first  time  -  data  which  is  of  great  value  at  this  stage 
of  our  computer  designing  efforts. 

Another  data-collection  form  (Fig,  7)  is  being  used  (tested) 
t*  collect  information  about  environmental  factors.  Our  data 
collection  forms  represent  a  "natural"  language  form  of  communica¬ 
tion,  designed  to  conserve  the  time  and  patience  of  our  professional 
data  extractors.  The  data  analyst (s)  seirves  as  an  intermediary 
between  the  data  extractors  and  the  computer.  He  will  check/edit 
the  data  collection  forms,  noting  frank  errors,  discrepancies,  or 
omissions,  and  translate  the  data  to  a  more  rigidly  formulated  form 
from  which  the  punch  card  operator  can,  without  further  interpreta¬ 
tion,  convert  it  to  standard  80  column  punched  cards,  subsequently 
to  be  put  onto  magnetic  tape.  As  necessary,  the  data  analyst  will 
seek  help  from  a  professional  data  extractor/consultant  to  make 
certain  that  "translations"  are  accurate  and  that  apparent 
discrepancies  or  omissions  are  real. 

Figure  8  presents,  in  summary,  our  present  concept  of  the 
sequence  of  events  which  will  provide  for  input  of  data  into  the 
MOD  system. 

Appendices  2,  General  Data-analvses  VocabulaTry.  and  3,  Factor 
Catalog,  are  directly  pertinent  to  Data  Management  considerc':;icn3. 
These  important  documents  represent  a  collaborative  effort  among 
all  of  the  professional  ilOD  workers,  but  Dr,  Guffey  was  the  primary 
mover. 
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FIGURE  6.  •  A  simple  disease-data  form  for  use 
in  extracting  data  on  leptospirosis  from  appropriate 
data  sources. 


FIGURE  7.  -  Data-collectlon  form  for  use  In  locating 
and  characterizing  already-existing  maps  which  show 
the  distribution  of  various  environmental  factors. 
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'Our  experiences  with  data-management  In  the  context  of  this 
comprehensive  program  have  exposed  complexities  which  X7e  could 
not  have  anticipatad.  It  has  become  clearly  evident  that  the  most 
critical  factor  limiting  meaningful  computer  output  in  our  proposed 
system  is  the  content7format  of  input  data. The  sources  of  the 
data  are  readily  available,  but  there  are  major  fHff^culttes  in 
extracting/ formatting  these  data.  Sumtnarl»-ing,  the  liOD  project 
data  extraction  and  file  generation  problems  may  be  categorized 
as  follows: 

(1)  Highly  varying  source  document  content  (requiring 
development  of  a  data-analysis  vocabulary  and  a 
factor  catalog  which  will  establish  common 
denominators) • 

(2)  Highly  varying  reliability  of  raw  data  (requiring 

a  system  for  defining  reliability  and,  on  occasion, 
validating  data). 

(3)  Continual  changes/additions  in  the  data  base  file 
(making  unusual  requirements  for  editing/updating) • 

(4)  Lack  of  a  generic  vocabulary  encompassing  medical/ 
disease/ environmental  situations  (related  to  item  #1). 

(5)  Inherent  complexities  in  the  data  which  make  it 
difficult  to  specify  feasible  procedure(s)  for  the 
extraction,  editing,  structuring,  and  storing  of  the 
data  prior  to  computer  input, 

(S)  Data  file  design  problems  due  to  complexities  of  the 
data  in  general,  its  very  large  potential  volume,  and 
the  great  number  of  Interreiaticnshlps  among  the 
specific  data  and  among  descriptions  associated  with 
vocabulary/ definitions/volume  after  computer  input. 
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DATA  SOURCES 


DATA 

EXTRACTION 


AND  LIST 


PUBLISHED  PROSE  SUMMARIES 
UNPUBLISHED  PROSE  SUMMARIES 
UNPUBLISHED  RAW  DATA 
PUBLISHED  AND  UNPUBLISHED  MAPS 


MEDICAL  STUDENTS  AND  OTHERS 

•  EXTRACT  BY  CHECKLIST 

•  INITIAL.  QUICK  EDIT 


DATA  ANALYST 

•  ASSURE  VALIDATION  AND  RELIABILITY 

•  FORMAT  FOR  KEYPUNCH 

•  VERY  SYSTEMATIC,  COMPREHENSIVE  EDIT 


DATA  ANALYST 


CORRECTIONS 


J 


i 

PRESTORE  DATA 
(CARD  TO  TAPE 
OFF-LINE) 


FIGURE  8. -Possible  sequence  of  events  for  providing  data  for  the  MOD 
computer  system. 


D/.TA  -  Pr.OCESSDK;  CONSIDBRATIOMS 


An  Introductory  discussion  of  this  phase  of  the  liOD  project 
appears  on  page  13, 

Capt,  Roger  Guffey  has  concentrated  on  this  aspect  of  the 
study,  working  in  collaboration  with  all  members  of  the  group, 
but  particularly  the  PRC  consultants  (Dr,  itorenoff,  lir, Ferguson, 
and  tir,  Richmond)  and  ihr,  Kline,  In  addition,  Capt,  Cuffey  has 
effectively  bridged  the  gap  between  data-management  and  data- 
processlng  and  has  made  the  major  contribution  in  our  developing 
data  format  system. 

Computer  Equipment  Requirements  rpreliminary  system  analysis 
indicates  that  the  HOD  computer  system  should  be  capable  of 
performing  the  following  functions: 

1,  Input  and  edit  data;  display  data.  Including  source, 

2,  Generate  and/or  augment/modify  (Including  updating 

and  general  maintenance  functions)  data  file(s), 

enyloylng  the  input  data, 

3,  Input  and  edit  queries**;  display  queries* 

4,  Retrieve  disease/environmental  Information  from  the 

d'ta  flle(s)  based  on  the  query  set, 

5,  Perform  high-speed  sorts* 

6,  Perform  mathematical  manipulations  and  transfoiiixations, 

7,  Generate  drive  tape  for  automatic  data-plotting 

("mapping")  device, 

8,  Generate  auxiliary  hard-copy  (prlnted)reports* 

9,  Display  contents  of  any  portion  of  the  data  file(s). 

Of  course,  facility  should  be  built  into  the  system  to 
allow  any  logical  combination  of  the  above  functions. 

Also,  all  displays  and  reports  should  be  query-controlled. 


*  The  word  "display"  is  used  here  to  mean 
printed  output  of  information  in  a  suitable 
(easily  understood)  format, 

**  The  abova-mentioned  queries  (query  set)  refer 
to  ataf f-srecl  ii-^d  inputs,  including 
guogvaphlc  coordinates/politic -^1  units, 
cc-nversico  factors  (map  scale,  map  projection, 
etc,),  and  instructions  as  to  what  disease/ 
envirorraent^^l  infonnation  is  to  be  plotted 
(mapped) , 
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A  medluin  -  to  large-scale  digital  computer  will  be  required, 
e.g.,  a  CDC  3200/3300,  an  IBii  7090/7094,  an  IBil  360,  or  larger  - 
scale  units*  Equipment  should  Include  at  least  four  tape  drives 
plus  a  disc  module  with  capacity  of  33  million  +  alphanumeric 
characters  per  module.  However,  a  limited  system  could  probably 
be  designed  £or  such  a  configuration  without  the  disc  module,  if 
an  additional  tape  drive  were  availeble.  Of  course,  usual 
peripheral  equipment,  e.g,,  card  readers,  high-speed  printers, 
etc,,  will  be  required,  in  addition  to  a  plotter,  specification 
of  which  will  be  considered  later. 

The  Armed  Forces  Institute  of  Pathology  is  scheduled  for 
installation  of  an  IBM  360  Model  30  medium-scale  computer  with 
5  tape  drives  early  in  1967,  and  this  will  satisfy  a  major  part 
of  our  hardware  requirements.  Arrangements  are  in  process  which 
(hopefully)  will  allow  us  to  use  other,  larger,  computers  in  the 
Washington  area,  including  an  IBM  7090/7094.  This  will  allow  us 
to  develop  and  demonstrate  capabilities  of  the  liOD  system  which 
are  beyond  the  limits  of  the  360  unit. 


Computer  Program  Re qulrements 

figure  nine  (next  page)  presents  the  functional  components 
of  the  IK'D  computer  system  and  shows  their  interrelationships* 

The  extent  and  complexity  of  the  programs  inr/olved  in  this 
operation  are  obvious.  Turning  back  to  Figure  four  (Schedule  of 
Major  Phases  ...)  page 14,  the  System  Design  and  System  Implementation 
Phases  are  those  in  which  there  is  to  be  major  effort  to  write 
programs  comprising  the  major  subsystems:  (1)  Data/Query  - 
Preprocessor/Editor,  (2)  File-maintenance,  (3)  Data-retrleval, 

(4)  Report- generator,  (5)  Graphic-generator,  and  (6)  Control- 
program, 

As  is  evident  from  Figure  three,  extracting  data  from 
data  sources  and  preparing  these  data  for  input  will  go  along 
with  the  design  of  programs  to  carry  out  data/query  - 
preprocessor/editor  and  file-maintenance  subsj’stems.  This  is  why 
our  major  effort  at  this  time  is  on  mechanisms  of  data  extraction/ 
formatting,  representing  a  collaborative  study  that  involves  both 
blo-professlonal  data-management  personnel  and  computer-professional 
data-processing  personnel;  such  collaboration  is  essential  to 
develop  a  compatible  system.  As  we  approach  solution  of  the  data 
extraction/formatting  problem,  prograaming  for  data/ query  - 
preprocessing/ editing,  file-maintenance,  and  data-retrieval  will 
begin.  As  the  programs  are  written,  they  will  be  checked  out 
against  the  data  base  file  (in  relation  to  hypothetical  queries). 

As  this  phase  of  programming  approaches  completion,  programs  will 
be  developed  for  the  report-generator,  graphic-generator,  and 
control-program. 
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n.i'a 


FIGURE  9«  -  Organization  of  MOD  computerized  disease- 
mapping  system,  as  presently  conceived* 

These  units  pertain  to  the  computer  system,  per  se,  and 
do  not  Include  essential  staff  activities.  For  example, 
no  feed  back  is  Indicated  in  relation  to  the  several 
error  reports.  These  would  be  evaluated  by  the  staff 
and,  as  necessary,  correction  of  "File  Maintenance", 
"Data  Retrieval",  or  "Report  Generator*'  would  be 


initiated  by  the  staff 


The  problems  involved  in  computer  plotting  are  greater  than 
we  had  originally  anticipated,  before  extensive  consultation  with 
PRC,  The  problems  have  proved  to  be  particularly  difficult  In 
the  area  of  contouring  because,  often,  contour  lines  cannot  be 
drawn  between  points  of  similar  arithmetic  values,  but  have  to 
reflect  things  of  equal  slRnlflcance.  and  here,  there  are 
difficulties  In  determining  actual  meanings  and  In  deriving  valid 
common  denominators.  This  problem  requires  development  of  new 
criteria,  as  Is  apparent  from  considering  the  General  Data- 
Analysis  Vocabulary  (Appendix  2)  and  the  Factor  Catalog  (Appendix  3). 

The  concept  of  program  checkout  will  be  to  proceed  from  the 
smallest  subprogram  checkout  to  the  complete  system  operation. 

Each  subsystem  will  be  checked  out  as  completed.  System  design  will 
be  such  that  checkout  can  proceed  on  each  subsystem  Independently. 
When  all  subsystems  are  checked  out,  they  can  be  combined  under  the 
control  program  into  the  final  system. 

The  final  phase.  System  Development  Testing,  has  been 
separated  from  the  Implementation  Phase  because  it  represents  a 
change  In  emphasis.  At  the  beginning  of  this  phase  the  data  files 
will  be  filled  with  actual  disease  data  (as  opposed  to  "manufactured" 
test  data  during  Phase  III)  and  queries  will  be  attempted  for  the 
first  time.  This  will  allow  evaluation  of  the  total  system  and  any 
"final"  changes  before  the  system  becomes  fully  operational. 

Final  program  documentation  to  be  prepared  for  delivery  at 
the  conclusion  of  the  Syrfr.n  De-'^elopTCTr-t  Testing  Phase,  will  consist 
of  program  listing:  for  each,  a  f;’rr.t.lonal  description, 

logical  dinr’rrm,  /lov  addition,  since  the 

W)D  computer  •  L  prev?.  --'.  h:’.]  ity  wh  ’  ch  Js  n'  -r,  a  user 

manual  will  be  ro?.  tl  ?.  ’otLod.-:,  o>.oceduro.s  and 

language  with  *dif.ch  ‘.  '■'e  poi.r.ur;’  il  vlll  eporate  the  system. 

Some  characteristics  of  the  several  files  and  subsystems  which 
will  comprise  the  iiOD  cc/mputerized  disease-mapping  system  are 
already  clear,  and  are  described  in  the  following  paragraphs. 

1.  Data  File : 

The  ilOD  system's  objectives  are  to  categorize 

FT^  disease/ environments!  data  in  sroh  e  manner 
that  the  data  base  can  be  queried  to  retr/v^  •.'vat 
portion  desired  which  can  then  be  plottec;  (ri;;  a  map) 
or  printed  (as  a  hard-copy  report)  for  detailed 
examination  and  analysis. 
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From  previous  discussion.  It  is  evident  that 
design  o£  the  focal  point  of  the  liOD  system, 
the  data  file,  is  still  under  development. 

However,  tentatively,  each  record  (representing 
one  data  point)  in  the  data  file  Is  expected  to 
consist  of  four  sections: 

(a)  common  elements  (CEL's),  identifying  the 
data  point' s  geographic  location,  time 
frame,  source  of  the  data,  etc,  (See 
Appendix  3) • 

(b)  the  disease/environmental  factor  (a  POF  or 
HOF)  studied  at  that  point. 

(c)  the  value  (a  FAV)  determined  for  that 
factor  at  that  location  and  time. 

(d)  supporting  narrative  material  (NAR)  as  necessary, 
(See  Appendix  2  In  relation  to  b,  c,  and  d.) 

Tne  common  elements  (CEL's),  factor  (POF  or  HOF), 
and  value  (FAV)  are  the  items  to  be  queried.  The 
supporting  narrative  material  (MAR)  Is  included 
only  to  assist  in  understanding  of  retrieved  data. 

It  will  consist  of  data  which  is  associated  with 
the  data  point  but  which  cannot  be  meaningfully 
napped  or  queried. 

rhe  data  flle(s)  will  probably  be  maintained  on  magnetic 
tape,  but  with  a  flexibility  allowing  use  of  dirk 
(random  access)  storage  if  such  proves  rlesir ;le. 

Dictionary  FjLle ; 

A  dictionary  file  will  be  rcq'v'n'.l,  r.onteiaJng  the 
Vocabulary  of  words,  ?  tcI’.  ■>!'  .'•yii  'jud  a  tree- 

8truct'.;red  list  of  •..’•ns-  whilr.  't'cls  can  be 
put  inf  o  the  die:  -v  t  v.  •r..v  pro-ram,  the 

synonyms  and  *>t  .tv-j  ci' u  uooc.  up 

manually.  The  r.w.  I  .f-.v:  v.  iyuruy-  : c,  vn?  clear  at 
the  begiuniny  o*  o-iv-  -vu,-;  *.  r  r.  '.v'»nent  of  a 
tree- str-’c lis.!  jrr-.'  I'.y  dr.t'Lned) 

was  ncif,  .  uLf  illn.ouV  of  iult.  i-:.tUc-:  requirement  has 
required  a  major  t.-22oCt, 
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Data/Query  -  Preprocessor/Editor  Su 


stem: 


This  program  will  function  to  Insure  that  the 
data  £lle(s)  and  the  query-requests  contain 
"clean"  information.  Data  from  the  Input  forms 
presently  being  designed  will  be  read  by  the 
Data/Query-Freprocessor/Edltor.  Each  data  point 
will  be  separated  Into  the  categories  which  match 
the  file  design.  It  will  be  necessary  for  this 
program  to  convert  the  CEL  for  geographic  location 
iron  longitude-latitude,  UTIl  Grid  Coordinates,  or 
Political  Unit  into  an  acceptable  computerized  form. 
Other  CEI;' s  may  be  handled  in  a  frc:-form 

manner*  The  FAV  will  be  checked  for  legality  and 
acceptability,  and  each  LOF  will  be  Individually 
edited  against  a  continually  expanding  dictionary  to 
guarantee  Its  validity.  The  edited  data  will  be 
written  onto  a  magnetic  tape  for  later  use  In 
updating  the  files. 

Editing  of  the  query-request  information  will  be 
done  in  much  the  same  way.  The  only  major 
difference  will  be  that  the  program  will  completely 
define  a  HOF  or  POF  in  such  a  way  as  to  allov/ 
retrieval  of  a  data  point  which  is  associated  with 
any  or  all  of  the  associated  LOFs,  in  any  desired 
combinations.  The  edited  query-request  information 
will  then  be  saved,  internally,  for  later  use  in  the 
data  retrieval  subsystem.  In  addition  to  its  other 
tasks,  the  data/ query-preprocessor/editor  program  will 
produce  a  hard-copy  report  (optional)  on  the  new 
data,  the  query,  and  any  error  messages  which  may  have 
been  generated, 

4,  File  -  l-Iaintenance  Subsystem; 

This  program  will  perform  the  updating  of  the  HOD 
file(s).  The  edited  data  file  will  contain  data 
points  to  be  added  to  the  IDD  file(s)  and  data 
which  modifies  existing  data  points.  Three  operations 
are  required:  first,  simply  to  add  new  data  points 
to  the  file;  second,  to  modify  information  for  a 
data  point  without  otherwise  changing  the  contents  of 
the  data  point  record;  third,  to  add  new  words, 
synonyms  and  tree  structures  to  the  "dictionary"  file. 
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5*  Data  -  Retrieval  Subsystem; 


This  program  v;lll  perform  the  retrieval  of  information 
from  the  HOD  file(s)  in  accordance  with  queries  input 
by  users  of  the  system.  The  program  will  use  the 
query  after  it  has  been  preprocessed  and  structured  in 
the  Data/C)uery-Preprocessor/EdItor,  The  iiOD  file(s) 
will  be  read  and  checked  against  the  query  requests, 
and  those  data  points  which  match  the  requests  will  be 
retrieved  and  written  onto  magnetic  tape  for  final 
processing. 

6.  Report  -  Generator  Sub tern; 

This  program  will  provide  the  capability  for  all  printed 
reports  desired;  it  will  also  catalog  or  list  the  data 
files  and  dictionary  file.  Reports  will  be  controlled  by 
the  user  via  report  request  cards.  Capabilities  will  be 
provided  to  display  portions  of  the  files  in  addition  to 
cataloging  the  information  (as,  for  example,  the  latest 
additions  to  any  of  the  files)  to  supplement  plotted  data, 

7,  Graphic-Generator  Subsystem; 

This  subsystem  will  extract  the  latitude,  longitude,  and 
factor-value  from  the  retrieved-data  file,  and  convert  it 
to  rectangular  (x,y,2)  coordinates.  It  is  our  present 
plan  to  have  the  program  construct  a  rectangular  grid. 
Interpolating  to  fill  in  a  -  values  at  all  (x,y)  grid¬ 
line-intersections  which  lack  observed  z  -  values. 

After  griddlng  is  completed,  additional  control  informa¬ 
tion  would  be  r<‘ad  in  (via  the  Control  Progran)  to 
define  further  the  maps  requested.  This  control  informa¬ 
tion  would  include;  (a)  name  of  ■^gion  mapped, 

(b)  scale  and  size  of  map,  (c)  boundaries  of  the  region 
mapped,  (d)  type  of  map  projection,  (e)  such  "legend'' 
data  as  date  prepared,  requestor's  name,  security  classi¬ 
fication,  general  description  (title)  of  data,  etc,, 

(f)  method  of  representing  data  on  map,  e.g.,  dots, 
shadings,  or  contours.  Then, any  of  several  existing 
contouring  routines  could  be  used  to  calculate  the  contours, 
and  the  selected  routine  incorporated  as  additional  control 
Information,  Further  processing  of  the  data  could  be 
performed  at  this  stage  before  requesting  the  subsystem  to 
produce  a  magnetic  tape,  containing  appropriate  instructions 
to  drive  an  automatic  plotter  (ordinarily  an  off-line 
activity), 

A  survey  of  plotters  potentially  useful  in  the  ilOD 
system  is  included  as  Appendix  4  of  this  report. 
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FIGL RE  10, -»Poss 1 bl e  MOD  coraputer-system  applications 


8*  Control  Subsystem; 


In  a  system  as  complex  as  this,  a  control  program  Is 
virtually  a  requirement.  Tills  subsystem  will  function  to 
coordinate  the  operation  of  all  the  other  subsystems. 

It  will  read  all  of  the  control  information  and  determine 
the  proper  subsystem  to  be  called  in  at  the  appropriate 
time.  Such  an  operation  will  minimize  procedural 
errors  and  the  necessity  for  computer  operator  Inter¬ 
vention,  and  will  speed  computer  processing.  Thus  It 
will  maximize  efficiency  of  total  system  operation. 


Pat a-Froces sing  Personnel  Requirements  Several  additional 
data-processing-orlented  persons  will  be  required  to  design, 
Implement,  and  develop/ test  the  hkJD  computer  system,  as  Indicated 
In  Figure  five,  page  16. It  is  virtually  impossible  to  hire  high 
quality  personnel  of  this  sort  on  short  notice  for  work  of 
relatively  short  duration.  Consequently,  the  realistic  approach 
is  to  get  them  by  subcontracting.  PRC,  in  view  of  its  very 
effective  previous  work  on  the  I'KDD  Project  (and  the  understanding 
which  It  has  developed) ,  is  the  obviously  preferred  concern  for 
future  subcontracts. 

In  view  of  the  scheduling  requirements  necessary  to  complete  the 
KOD  Project  by  15  November  1968,  the  following  persons  are 
considered  to  represent  a  minimal  requirement: 

1.  Computer-systems  analyst  (in-house,UAREP)  -  will 
assist  In  the  system  analysis  and  design,  and 
will  implement  the  graphic- generator  and  control 
subsystems. 

2.  Subcontractor  Senior  Project  lianager  (PRC)  -  will 
perform  overall  technical  and  administrative 
supervision  of  various  phases  of  the  project, 
prepare  and  deliver  briefings  as  required  by 
UAREP-ZiPIP,  maintain  cognizance  of  related  efforts 
performed  by  other  Governmental  Agencies,  perform  systems 
analysis,  and  aid  In  the  systems  design  effort. 


Continued  - 


-29- 


3.  Subcontractor  Deputy  Project  Manager  (PRC)  -  will 

be  responsible  for  technical  and  administrative  supervision 
on  a  day-to*day  basis  for  the  various  phases  of  the 
project.  He  will  also  prepare  and  deliver  briefings  as 
required  by  UAREP-AFIP,  maintain  dally  liaison  with  the 
client  to  insure  that  all  needs  are  satisfied.  Inspect 
the  progress  of  programmers  to  assure  that  contractual 
demands  and  deadlines  are  being  met,  perform  systems 
analysis,  and  aid  In  the  systems  design  effort. 

4.  Computer  Programmer/Analyst  (PRC)  -  will  assist  In  the 
systems  analysis  and  design  and  data  file  design.  He 
will  also  Implement  the  data/query-praprocessor/editor. 

He  will  assist  in  integrating  this  with  the  control 
program,  (This  person  will  be  more  experienced  than 
the  next-described  programmer.  Inasmuch  as  his  duties 
are  more  demanding.) 

5,  Computer  Programmer  (PRC  or  UAREP)  -  will  Implement  the 
file-maintenance,  data- retrieval,  and  report-generator 
subsystems.  He  will  also  assist  in  integrating  these 
with  the  control  subsystem, 

6,  Automatic  Data-processir.g  Support  personnel,  e.g., 
keypunch  operators  and  computer/plotter  operators  - 
will  be  required  to  supplement  the  work  of  the 
previously  listed  personnel 

The  UAREP  (in-house)  computer- systems  analyst  and  also,  possibly, 
the  subcontractor  deputy  project  manager  will  be  involved  with 
maintenance  and  other  functions  to  be  performed  during  the  final 
System  Development  Testing  Phase, 
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Plotter  Equipment  Considerations:  Computerized  maps'*'  can  be 
produced  in  several  ways: 

(1)  by  high-speed  (line-)  printers. 

(2)  by  ink-on-paper- type  plotters;  either  flatbed  or  drum, 
and  either  analog  or  digital, 

(3)  by  cathode- ray- tube  (CRT)  devices  either  with  direct 

recording  on  micro  film  or  visual  c>u  a  s.'jrcen 

Either  high-speed  printers  or  cathode-ray- tube  (CRT)  display  screens 
could  be  used  to  obtain  a  quick,  overall , low-resolution  view  of 
what  a  particular  map  would  look  like.  Then,  if  the  user  decided 
that  the  particular  map  would  be  useful  to  him,  he  could  request 
that  the  map  be  plotted  in  detailed,  high-resolution  fashion  by 
an  Ink-on-paper  plotter. 

At  the  present  state  of  their  development,  CRTs  could  yield 
only  low  resolution  maps  (not  more  ^aan  1000  points  per  15  linear 
Inches),  High  speed  line  printers,  in  addition  to  producing  high 
resolution  maps,  are  able  to  provide  much  the  same  "oversight" 
information  quickly.  The  advantages  of  the  ‘  quick  look"  would  be 
that  one  could  determine  whether  the  data  under  consideration  was 
actually  worth  the  cost  in  time  and  effort  of  high  resolution 
mapping. 

We  have  not  dropped  altogether  our  interest  in  CRTs  because 
developments  within  the  next  two  years  could  lead  to  marked 
Improvement  in  their  resolution.  The  advantages  of  high-resolution 
cathode  ray  tube  imagery  are  great:  they  could  provide  maps  very 
quickly  (seconds).  In  addition , they  could  provide  a  continuous 
range  of  scales,  allowing  the  user  to  zoom  in,  so  to  speak,  on  a 
geographic  area  of  particular  interest. 


*  In  the  context  of  this  program,  a  map  is  considered  to  be  a 
graphic  representation  of  data  distributed  meaningfully  in 
relation  to  geographic  coordinates.  Ilore  often  than  not, 
the  significance  of  the  data  which  we  produce  by  computer 
will  not  become  apparent  unless  it  is  plotted  on  a  base  map 
or  used  as  an  overlay  on  an  existing  map. 
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A  preliminary  survey  of  automatic  plotting  devices 
potentially  available  to  the  I-fOD  Project  has  been  carried  out 
Jointly  by  UAREP-AFIP  and  PRC  personnel.  The  approach  was 
twofold:  (1)  several  agencies  in  the  Washington,  D.  C,,  area 
were  visited  by  the  study  team,  and  the  capabilities  of  the  agencies 
were  discussed  (with  their  personnel),  taking  into  consideration 
the  problem  areas  and  areas  of  specific  correlation  to  MOD  goals; 

(2)  the  various  coraputers/plotters  available  were  considered  in 
terms  of  software  requirement  and  plotter  speed/ accuracy. 

In  addition,  a  detailed  survey/analysis  of  existing  plotters 
was  carried  out  by  Mr,  Kline,  results  of  which  are  presented  as 
^^pendlx  4. 

At  the  present  time,  it  appears  that  either  a  medium  size 
Gerber  (series  600-1000)  or  Cal comp  (series  600-800)  plotter  would 
best  suit  the  needs  of  the  MOD  system.  It  is  recommended  also  that 
a  "flatbed"  type  of  plotter  be  employed  (as  opposed  to  a  "drum" 
type).  This  is  because  of  the  increased  versatility  of  reproduction 
techniques  that  a  flatbed  allows,  e.g.,  direct  photographic 
recording,  etching,  etc,  which  could  be  an  important  advantage  if 
a  larga  number  of  copies  of  a  particular  map  were  to  be  produced  in 
a  shor:  time. 

Timing  to  software  requirements,  Calcomp  plotters  come 
equipped  with  a  macro-type  language  program  which  facilitates  fast 
and  efficient  programming  of  a  "drive  tape".  The  actual  advantage 
of  this  Is  not  so  great  as  it  first  appears,  however,  since 
Gerber's  instruction  set  (micro)  is  less  complicated  than  Calcomp’ s. 
Furthermore,  Gerber  flatbed  plotters  are  generally  more  accurate 
than  Calcomp  plotters.  However,  highly  accurate  machines  may  not 
be  essential  for  the  i'X)D  system. 

With  regard  to  plotter  availability  in  V/ashington,  the  study 
team  came  to  the  conclusion  there  should  be  no  significant  problems. 
Several  of  the  agencies  visited  have  plotter  time  which  would  be 
available  to  the  I'JOD  project  at  low  cost.  Plotter  availability 
will  be  considered  further  at  the  time  detailed  system  design  gets 
under  way. 
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Output  Considerations  -  The  primary  output  from  the  HOD  computer¬ 
ized  system  will  consist  of  computer/plotter-drawn  maps  showing 
the  geographic  distribution  of  various  disease/environmental 
situations.  Before  discussing  map-outputs  in  detail,  however, 
it  Is  appropriate  to  mention  several  other  kinds  of  output  which 
will  also  be  desirable  -  and  possible: 

First,  for  purposes  both  of  computer- system 
development  and  check-out,  we  should  have  an 
ability  to  print  out,  in  readily-understood 
iormat,  the  contents  of  any  given  part  of  the  tiOD 
data  and  dictionary  filei^. 

Second,  for  similar  reasons,  we  have  need  to  print 
out  the  query  set  being  used  to  generate  other 
(visual  map-form)  outputs. 

Third,  printed  listings  of  disease/ environmental  data 
which  display  certain  significant  interrelationships, 
would  be  of  potential  Importance  since  these  would 
suggest  particular  map  output.  Also,  results  of 
statistical  correlation  tests  made  betv/een  various 
disease/envirormiental  data  should  be  printed  out. 

The  Fourth  kind  of  output  is  rather  complex.  When  a 
person  compares  and  analyzes  the  data  patterns 
displayed  on  several  maps  (in  order  to  determine 
relationships  among  the  diseasc/envirorjsental  factors 
mapped),  he  goes  througft  a  set  of  procedures  which 
might  be  approximated/ imitated  by  the  computer  system. 

If  these  procedures  could  be  described  by  a  suitable 
analog,  the  computer  system  itself  could  compare  and 
analyze  the  data  which,  otherwise,  it  would  have 
output  as  several  different  maps.  This  would  allow  it 
to  output,  in  some  fashion,  a  description  of  those 
relationships.  Such  a  capability,  even  if  very 
elementary,  could  be  useful  in  deciding  what  particular 
disease/environmental  factors  should  be  subsequently 
mapped.  Also,  it  vjould  allow  changes  (increases  or  deer 
:n  the  disease/ecvironmental  situation  to  be  readily 
displayed. 

The  F if th  kind  of  output  concerns  an  area  of  medical 
interest  that  may  have  important  ramifications,  an  area 
discovered  during  the  System  Analysis  Phase,  The 
anticipated  data  file  base,  including  data  relevant  to 
a  wide  spectrum  of  disease/ environmental  situations, 
allows  gA*Tieratior»  of  graphic  displays  which  might  yield 
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FIGURE  11. --Possible  graphic  display 
employing  the  MOD  system's  data 


which  might  be  able 
fl les. 


to  be  generated 


icportant  etiological  patterns  and/or  mathematical 
relationships.  Flg;urc  11  llU’Strt''.<.os  a  hypothetical 
relationship  of  this  type*  elevation  versus  period 
prevalence  plotted  over  a  range  of  five  degree 
temperature  npans.  (This  can  be  looked  at, 
mathematically,  as  a  family  of  curves).  There  Is  the 
possibility  of  producing  virtually  an  Infinite  variety 
of  such  three  factorial  relatlcnsbips.  Extensions  of 
this  concept  cculd,  conceivably,  allow  predictions  as 
to  the  probability  that  n  specific  geographic  area 
would  have  a  specific  dj.sf^ase  incidence  over  a  specific 
time,  suggesting  specific  preventative  measures. 


Carrying  this  concept  a  step  further,  one  could  achieve 
a  two  dimensional  relationship  of  four-factorial 
functions.  ' 

V^y  N»  F  A  i  »■ 
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Associated  with  this  concept  is  the  Idea  of  overlaying 
on  the  same  basic  grid  (using  transparent  material), 
several  disease/envlronmental  data  elements.  This 
would  yield  intersections  of  curves,  the  Intersection 
points  of  which  would  probably  have  important 
significance. 


Furthermore,  in  considering  causal  factors  of  disease, 
a  mathematical  reiscionship  might  be  approximated 
among  the  variable*,  ^tems.  If  this  were  accomplished, 
altering  these  (In  succession)  might  well  give  new 
Insight  Into  the  precise  effects  of  the  altered 
variables,  exposing  the  critically  Important  ones. 
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sixth,  and  finally,  we  may  wish  to  compare  the  disease 
data  In  the  liOD  computer- system  files  with  environ¬ 
mental  data  already  presen '-ed  In  map  form  and 
this  could  be  done  In  one  of  four  ways:  (a)  the  latter 
data  could  be  digitized  and  Input  to  the  rA)D  system; 

(b)  the  already-existing  map  could  be  redrawn 
(manually)  to  a  different  (appropriate)  scale  and 
projection,  and  used  as  a  base  map;  (c)  the  map  could 
be  photographically  reduced  or  expanded  to  a  different 
(appropriate)  scale  and  used  as  a  base  map;  (d)  the  map 
might  be  used  directly,  without  change,  as  a  base  map. 
Decision  as  to  the  most  appropriate  choice  will  depend 
a  good  deal  upon  how  flexible  the  output  capabilities  of 
the  liOD  system  are. 

Returning  to  the  major  output  requirement  of  the  iiX)D  system  - 
to  produce  (by  computer/plotter  techniques)  maps  which  display  the 
geographic  distribution  of  certain  disease/ environmental  situations 
this  has  been  approached  In  three  ways: 

(a)  by  a  general  Investigation  of  presently  existing 
cartographic  techniques,  both  manual  and  automated. 

(b)  by  a  manual  plotting  of  actual  (and  manufactured) 
disease/ environmental  data 

(c)  by  computerized  plotting  of  actual  (and  manufactured) 
disease  data  using  a  commercially  available  computer 
routine. 

(Approaches  2  and  3, above,  represent  pilot  studies  involving 
the  kind  of  data  which  will  eventually  be  put  Into  the  IKDD 
system's  data  files  for  use  In  generating  maps.) 

There  are  three  general  categories  of  problems  which  must  be 
successfully  handled  by  the  liOD  computer/plotter  system  if  It  is 
to  function  effectively  in  Its  "mapping  of  disease".  These 
problem  categories,  common  to  fabricating  any  map,  involve: 

(1)  map  size  and  scale;  (2)  map  projection;  (3)  method  of 
representing  data  on  map. 

The  size  and  scale  of  any  given  map  to  be  produced  by  the 
MOD  computerized  mapping  system  will  be  determined  by  tv/o 
considerations : 

First,  the  size  of  the  map  will  be  limited  by  plotter 
capacity  (in  terms  of  paper  size);  second,  the  scale  of 
the  map  will  depend  upon  the  actual  size  of  the  region  to 
be  mapped,  i.e,,  whether  that  region  is  the  entire  world, 
south-east  Asia,  or  southern  Illinoi*?,  in  relation  to  the 
size  of  the  maps. 
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The  map  projection  to  be  used  for  a  particular  map  cf.n  be 
varied  by  Introducing  appropriate  (relatively  simple)  programs 
which  will  alter  the  grid  pattern  onto  which  specific  data  will 
be  plotted.  Although  any  mathematlcally-derinable  projection 
could  be  used  by  tbo  system,  four  projections  have  been 
tentatively  selected  as  standard;  equirectangular  (the  standard), 
Goode's  homoloslne,  illller' s  cylindrical,  and  Mercator's 
projections.  The  equirectangular  projection  has  the  advantage  that 
it  is  equivalent  to  a  simple  rectangular  coordinate  (x,y)  grid, 
and  this  allows  relatively  simple  programming.  Goode's  homoloslne 
projection  has  the  significant  advantage  of  being  an  equal-area 
projection,  Mercator's  and  Miller's  cylindrical  projections 
have  the  advantage  of  wide  familiarity.  Furthermore,  many  already- 
existing  maps  that  show  distributions  of  dlsease/envlronmcntal 
factors  arc  drawn  according  to  these  projections. 

There  are,  for  our  purposes,  three  basic  methods  of 
representing  disease/environmental  data  on  distribution  maps. 

One  may  use  dots,  shading,  or  contours  -  singly  or  in  various 
combinations. 

Dot-type  maps  are  illustrated  in  figures  12  and  13.  This  type 
of  map  can  be  made  quite  simply  by  computerized  techniques,  and 
very  effectively  presents  some  kinds  of  disease/ environmental  data. 

Shading- type  maps  are  shown  in  figures  14  and  15,  Although 
this  type  of  map  is  quite  easy  to  produce  manually,  it  presents 
serious  technical  problems  from  a  computer  viewpoint;  we  are  still 
working  to  overcome  these  problems. 

Contour-type  mapping  is  Illustrated  in  figures  16  and  17. 

If  the  data  presented  have  a  fairly  uniform  distribution,  falling 
into  a  regular,  rectangular  grid  pattern,  they  can  be  controired 
by  existing  computer  techniques  v;ithout  much  difficulty.  If  the 
data  are  randomly  distributed  (as  is  usually  the  case),  these  same 
computerized  contouring  techniques  can  be  utilized,  but  the  data 
must  first  be  fitted  into  a  rectangular  grid  by  means  of  various 
averaging  and  interpolating  techniques,  (Obviously,  a  computerized 
contour  map  could  easily  and  quickly  be  converted  to  a  shaded-type 
map,  manually.)  Combinatiors  of  dot-type,  shading-type,  and 
contour-type  mapping  are  illustrated  in  figuies  13  and  19. 

Figures  12,14,16,18,  and  20-23  were  drawn/plotted  using  data 
which  we  extracted  and  formatted  for  the  specific  purpose.  The 
resulting  maps,  representing  the  first  effort  of  this  sort,  are, 
admittedly,  far  from  perfect.  But  they  are  prototypes  and,  as 
such,  represent  an  important  achievement  since  they  demonstrate 
that  meaningful  disease-distribution  maps  can  be  produced  by  the 
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FIGURE  12  -  Some  actual  disease  data  presented  as  a  dot- 
type  map  drawn  manually;  these  data  were  extracted >  prepared, 
and  mapped  as  part  of  the  MOD  Project  effort. 
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FIGURE  13.  -  Examples  of  computer-produced  dot- type  maps. 


FIGURE  14  -  The  same  disease  data  as  Fig.  12,  but  presented 
as  a  shadlng-type  map  drawn  manually. 


^KljerMtorjd! 


^  CONNALLY  . 

^  Yarborough 
I  1  ibvdeoded 


#« 

iJl^ 


C^v3nT»J  rUthnj  NewjWtk^jy^t  I) 


FIGURE  15.  -  An  example  of  a  shadlng-type  map  produced 
by  computer. 
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FIGURE  16  -  The  same  disease  data  as  Fig.  12,  but  presented 
as  a  contour- type  map  drawn  manually. 


FIGURE  17.  •  Examples  of  computer-produced  contour- type  maps. 


iNFcCTIoM  f\ATE(%)  Or  SCHlsTiXSoMlASIS  iUF  To 
S't/I^TiSom  fAANSol^l  IN  HUNVANS  ;  jrwfieJ 


FIGURE  18.  -  The  same  disease  data  as  Fig.  12,  but  presetited 
as  a  manually-drawn  map  utilizing  dot-,  shading-,  and 
contour-mapping  techniques. 


Figure  19.  •  Example  of  a  computer-produced  map  using 

shading  (  the  alternating  bands  of  white  and 
black  figures)  and  contour  (the  boundaries  between  adjacent 
black  and  white  bands)  technique. 


iSFfCTIoM  M  SCHirTMoMlASlS  JUE  To 
SCHl^ToScnA  fAMSoNt  IM  MUNVANS  ; 


t '..u^ivilerlzed  mapping  system*. 

These  disease  data  were  contour^^.d  manually  (figures  16  and  18) 
by  different  members  cf  the  AFT-P-U-'f-ilP-IRr!  study  team.  The  data 
were  also  contoured  automatically  (flgure-s  20-23)  at  the  Rockville 
Data  Center  of  the  Control  Data  Corporation,  using  a  commercially 
available  computer  program:  processing  by  a  CDC  3600  computer 
followed  by  off-line  plotting  on  a  Calccmp  drum-type  plotter.  The 
reported  (actual)  disease  data  was  combined  with  a  variety  of 
estimated  data  In  order  to  explore  various  techniques  for 
presenting  the  data  and  to  evaluate  the  effects  of  various  limita¬ 
tions.  The  data  actually  reportnd  coveted  only  provinces  In 
Brazil,  Surinam,  and  parts  of  Venezuela.  Additional  points  were 
estimated  for  all  other  South  /unerlcan  provinces  (based  on  actual 
data  for  neighboring  regions  -  a:i  Interpolatlve  process.  In  a 
sense).  A  set  of  coastal  points  and  a  set  of  oceanic  points  (each 
with  a  zero  value)  were  also  prepared. 


****** 


Essentially,  three  tasks  are  performed  by  the  CDC  programs. 

The  first  Is  to  add  points  In  areas  of  sparse  distribution;  a  linear 
Interpolation  la  made  between  two  adjacent  data  points  to  add 
control  points  at  levels  which  do  not  exist  la  the  original  data. 

The  second  task  is  to  calculate  g;rld  or  mesh  point  values  for  a 
rectangular  grid.  The  method  usod  in  determining  each  grid  value  is 
to  find  the  nearest  known  control  points  which  surround  the  grid 
point  and  then  to  calculate  the  value  by  an  Inverse  distance 
function.  Calculation  of  grid  values  Is  done  ctily  when  control  Is 
present.  The  third  task  Is  to  contour  automatically  the  gridded 
data.  ThAse  data  are  expressed  In  x,  y  coordinates  with  a  z  value 


*  The  word,  system,  used  In  this  context  Is  by  no  means  a 
synonym  for  software/hardware  components;  data  ma:.agement 
aspects  also  form  an  essential  "component".  As  a  matter 
of  fact,  our  most  significant  achievements  toward 
producing  computer/plotver  output  maps  relate  to  the 
General  Data-Analyses  Vocabulary  and  the  Factor  Catalog 
presented  as  Appendix  2  end  3,  respectively. 
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for  contouring.  The  control  values  are  stored  within  a  matrix 
through  which  the  program  traces.  Interpolating  to  find  the 
points  through  which  the  contour  lines  pass.  The  contouring  is 
performed  in  strips  of  two  adjacent  rows  of  the  matrix. 

Contouring  is  not  performed  unless  control  points  exist.  The 
results  of  two  parabolic  Interpolations  are  traced  to  compute 
the  path  of  each  contour  line.  As  positions  are  calculated, 
plotter  conanands  are  stored  in  an  Internal  array,  and  output  onto 
the  plotter  drive  tape  each  time  the  array  Is  filled.  Optionally, 
the  location  of  the  data  points,  values  and  the  grid  lines  may  be 
plotted. 

The  size  of  the  grid  is  a  variable  which  '•au  V  eni'.ernd  as  an 
additional  requirement,  supplementing  the  program,  When  a  coarse 
(large-mesh)  grid  size  was  specified,  the  data  was  averaged 
together,  and  significant  points  were  lost.  However,  the  broad 
trends  in  the  data  remained  clearly  evident,  particularly  when 
both  the  reported  disease-data  points  .and  the  estimated-to-be-zero 
data  points  (for  the  other  provinces)  were  processed  (Fig,  20), 

IVhen  a  fine  (small-mesh)  grid  was  specified,  resulting  in  no 
averaging  or  loss  of  data  points,  the  contours  produced  depended 
upon  the  choice  of  data  points  processed.  Reasonable  maps  were 
obtained  with  reported  data  points  alone  (Fig,  21),  and  with 
reported  plus  oceanic  data  points  (Fig,  22),  However,  when  reported, 
estimated,  coastal,  and  oceanic  data  points  were  all  included, 
very  unrealistic  contours  resulted  (Fig,  23),  perhaps  due  to  the 
very  large  number  of  data  points  which  would  have  had  to  be 
generated  in  order  to  fill  the  grid. 

j  Our  analysis  of  these  maps  indicates  that  the  CDC  method  of 
computing  the  "fill-in"  data  points  is  unsatisfactory  for  the  MOD 
project.  This  problem  is  still  under  investigation,  and  we  plan 
to  use  other  available  contouring  routines,  e.g.,  the  program 
available  at  the  Naval  Oceanographic  Office,  We  plan  also  to 
investigate  programs  utilizing  Tobler's  method  of  completing  the 
grid,  Tobler's  method,  simply  stated,  is  to  use  only  the  three 
closest  points  which  surround  the  grid  point  in  question  through 
which  to  fit  a  plane.  The  grid  point  therefore  lies  on  this  plane 
and  its  value  may  be  computed. 

In  conclusion,  the  actual  mapping  efforts  which  have  been 
described  above  show  that  computer  contouciag  of  disease/environ¬ 
mental  data  can  be  successfully  done  when  properly  selected  and 
characterized  data  can  be  given  quantitative  values.  This 
important  restriction  brings  us  back  once  again  to  the  importance 
of  data  management  considerations. 
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COMCLUSIONS  AITO  RECOiiilENDATIOKS 


The  foregoing  pages  detail  the  accomplishments  of  the  first 
year’s  work  on  the  computerized  tapping  of  Disease  (IDD)  Project. 

These  accomplishments  arc  stunmarlzed  on  Page  three  of  this 
report. 

In  addition  to  presenting  accomplishments  of  the  past  year, 
the  report  discusses,  in  detail,  the  equipment,  programs,  and 
personnel  which  will  be  required  to  develop  the  liOD  system  to 
a  fully  operational  capability  during  the  next  two  years,  and 
considers  potential  output  of  the  fully  operational  system. 

We  sincerely  believe  that  the  project  Is  progressing  well, 
that  the  likelihood  of  success  is  great,  and  that  efforts  to 
complete  it  should  be  made  along  the  lines  Indicated  In  this 
report . 
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GEXRAPHIC  r-ISTRILUTION  OF  INFECIIOUS  DISEASES 

EXPENDITURES  REPORT 
November  15.  1955  -  Novemher  14,  1956 

DIRECT  COSTS; 

Direct  Selarlee  . . . . . . $  8,267*45  (.  ) 

Social  Security,  Group  Insurance  and  Other 

Fringe  Benefits . . .  527.60  (I) 

Travel  . . >.*.**.  130.44 

Equipment  . . 4,607.65  (2) 

Communications  . . *.**.  43*08 

Consulting  Fees  *....*,*. . 50,00 

Supplies  and  Service  . . ^ .  950.82 

Books  and  Periodicals  414*49 

Subcontract  -  Planning  Research  ConH'i  ulon  24,769.25  (3) 

TOTAL  DIRECT  COSTS  .  $39,760.78 

INDIRECT  COSTS; 

Indirect  Charges  . . 3,140.28  (4) 

TOTAL  COSTS  .  $42, 901. Ob 


41- 


HQTES ; 

1.  Schedule  of  Persornel  attached  (Schedule  1) 


2«  Equipment:  1  Flexovnriter  with  Desk  Assembly  *»•  $  2,895#00 


1  Separator  . .  1,600.00 

1  Single  Faced  Filing  Unit  .........  112.65 


$  4.607.65 
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EXPENDITURES  REPORT 


NfTRS:  (Continued) 


4.  Audit  performed  by  DCAA  which  established  overhead  rate  of 
37.79%  of  Salaries  and  Wages  for  the  period  ending  December  31, 

1965.  Present  billable  provisional  overhead  rate  is  38%  of 
Salaries  and  Wages. 

5.  This  report  Includes  all  costs  Incident  to  performance  of  the 
contract  for  the  period  November  IS,  1965  through  November  14, 

1966. 

6.  Supporting  vouchers  and  other  documents  on  file  are  available 
for  audit. 


(Schedule  1) 
PERSONNEL 


November 

15.  1965  -  November  15, 

1966 

FRINGE 

BENEFITS 

NAi'lE 

POSITION 

SALARY 

Idargaret  L.  Chu 

Research  Librarian 

$5,175.00 

$387.40 

Shirley  K.  Elsenberg 

Asst. Res. Librarian 

1,015.05 

52.95 

Gary  G.  Gullet 

Research  Assistant 

514.90 

21.63 

Harold  M.  Kline 

Systems  Analyst 

1.562.50 

65.62 

TOTALS  . . 

$8,267.45 

$527.60 
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APPENDIX  1 


<»GANIZATI(»TS  CONSULTED  DURING 
FIDST  YE/Jl  OP  MOD  PROJECT 
IN  HELATIOH  TO  DAT/>  AND/OR  DATA  PROCESSING 

American  Geographical  Society 
American  University,  Dept,  of  Geography 
Arco  Corp, 

Atlantic  Research  Corp. 

Auerbach  Corp. 

Benson-Lehner,  Inc,  (B-L  Plotters) 

BloSclences  Information  Service  of  Biological  Abstracts  (BIOSIS) 

Bowman-Gray  School  of  rlediclne,  Pathology  Records  Retrieval 
Program 

Bunker-Ramo  Corp, 

California  Computer  Products,  Inc,  (CalComp  Plotters) 

Catholic  University  of  /unerlca,  Dept,  o^'  Geography 
Chemical  Abstracts  Service 
Control  Data  Corp,  (CDC) 

Electronic  Associates,  Inc,  (EAI  Plotters) 

FIJ/.,  Inc. 

General  Motors  Corp.,  Allison  Dlv, 

George  Washington  University,  Dept,  of  Geography 
Georgetown  University,  School  of  Medicine 
Geo- Space  Corp, 

Gerber  Scientific  Instnament  Co,  (Gerber  Plotters) 
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Harvard  University,  Laboratory  for  Computer  Graphics 
Howard  University,  Dept,  of  Geology  and  Geography 
Illinois  Natural  History  Survey 
Illinois  State  Geological  Survey 

Indiana  University,  Dept,  of  Astronomy  and  Dept,  of  Geology 

International  Business  Machines  Corp.  (IBM) 

London  School  of  Hygiene  and  Tropical  Medicine, 

Dept,  of  Parasitology 

McLean  Paleontological  Laboratory 

Planning  Research  Corp.  (PRC) 

BAUD  Corp, 

System  Development  Corp,  (SDC) 

Systems  Research  Group,  Inc,  (SRG) 

Thailand  Govt.,  Royal  Thai  Amy,  Medical  Service 

United  Kingdom  Govt,,  Ministry  of  Overseas  Development, 
Dept,  of  Technical  Cooperation 

U,  S,  Govt,: 

Dept,  of  Agriculture,  Washington  Computer  Center 

Bur,  of  the  Census,  Computer  and  Data-Processlng  Dept. 

Central  Intelligence  Agency  (CIA),  Medical  Division 

Clearinghouse  for  Federal  Scientific  and  Technical 
Information  (CFSTI) 

Dept,  of  Defense: 

Aeronautical  Chart  and  Information  Center  (ACIC) 

Air  Force  Technical  Applications  Center  (AFTAC) 

Armed  Forces  Institute  of  Pathology  (AFIP), 

Automatic  Data  Processing  Section 
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Armed  Forces  Pest  Control  Board  (AFPCB) 

Amy  liap  Service  (AllS) 

Amy  Materiel  Cocmand  (AMC),  Foreign  Science  and 
Technology  Section 

Army  Materiel  Command  (AIX),  Systems  Development 
and  Design  Division 

ijnciy  Matlck  Laboratories,  Earth  Sciences  Division 

Amy  Research  Office  (/iRO) 

Defense  Documentation  Center  (DDC) 

Defense  Intelligence  Agency  (DIA) 

Military  Entomological  Information  Service  (MEIS) 

Naval  Command  Systems  Support  Activity  (NAVCOSSACT) 

Naval  Oceanographic  Office  (NAVOCE/ilO) 

Naval  Weapons  Laboratory  (W-TL) 

Walter  Reed  Amy  Institute  of  Research  (WRAIR) 

Geological  Survey 

Library  of  Congress,  Map  Division 

Library  of  Congress,  National  Referral  Center  for 
Science  and  Technology 

National  Aeronautics  and  Space  Administration  (NASA), 
Goddard  Space  Flight  Center 

National  Bureau  of  Standards  (NBS),  Center  for  Computer 
Science  and  Technology  and  Computer  Sharing  Exchange 

National  Institutes  of  Health  (NIH),  Division  of 
Computer  Research  t  Technology,  Environmental  Health 
Division,  National  Cancer  Institute,  and  National 
Institute  of  Allergy  and  Infectious  Diseases. 
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National  Library  o£  lledlclne  (Mlil) 

National  Oceanographic  Data  Center  (NODC) 

Public  Health  Service  (PHS),  Conanunlcable  Disease  Center  (CDC) 

Smithsonian  Institution,  Natural  History  Ihiseum, 

Dept,  of  Invertebrate  Paleontology  and  Dept,  of 
Vertebrate  Zoology 

Weather  Bureau 

UNIVAC  Division  (of  Sperry-Rand  Corp.) 

University  of  Buffalo,  School  of  Medicine,  Computer  Center 

University  of  Illinois,  Dept,  of  Computer  Science, 

Dept,  of  Forestry,  Dept,  of  Geography,  Division  of  Human 
Etiology,  School  of  Veterinary  Medicine,  Center  for  Zoonoses 
Research 

University  of  Kansas,  State  Geological  Survey  of  Kansas 

University  of  Maryland,  Dept,  of  Geography  and  School  of 
lledlclne 


University  of  Michigan,  Dept,  of  Geography 

University  of  Missouri,  College  of  Medicine,  Computer  Center 

VIoodard  Research  Coxrp. 
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APPENDIX  2 


GEIIER/J.  DAT/i-Z^l/XYSIS  '/0C/J3UURY 


LOP  (Lo\^-Order  Factor).  .  .  The  most  specific  name 

or  description  of  a  particular 
disease/environmental  situation. 
Examples;  Point  prevalence,  period 
prevalence.  Incidence,  leptospirosis, 
schistosomiasis,  L»  pomona ,  L.  canlcola, 
I.,  hard  jo.  mansonl.  laponlcum. 
raccoons,  slcunks,  foxes,  Isolated 
from  urine.  Isolated  from  blood, 
serologic  tests,  etc. 

IDF  (Middle-order  Factor)  ...  Definition:  The  set  of  all  LOFs  which 


describe  the  same  aspect  of  disease/ 
environmental  situations. 

Examples: 


_ IDF _ 

Kind  of  epidemiologic  Index  ... 

General  kind  of  disease  ... 

Specific  disease  agent  ... 

Animal  hosts  Involved  ... 

Method  of  diagnosis  ... 


LOFs  Making  Up  the  IDF 

Point  prevalence,  period 
prevalence,  incidence 

Leptospirosis,  schistosomiasis 

L.  pomona.  L.  canlcola. 

L.  hard  jo.  mansonl. 

£.  laponlcum 

Raccoons,  skunks,  foxes 

Isolated  from  urlne^,  isolated 
from  blood,  serologic  tests 


HOF  (Hlgh-Order  Factor)  •••  Definition:  A  specific  combination  of 


LOFa.  no  two  LOFs  being  drawn  from 
(belonging  to)  the  same  KOF. 
Examples; 


Specific 

HOF 

Kind  of  Epl~ 

detnlologlc 

Index* 

General 
Kind  of 
Disease* 

Specific 

Disease 

Agent* 

Animal 

Host 

Involved* 

liethod 

of 

Diagnosis* 

HOF  1 

Incidence  of 

lepto¬ 

spirosis 

due  to 
L.pomona 

In 

slcunlcs 

as  determined 
by  Isolation 
from  urine 

HOF  2 

Incidence  of 

lepto¬ 

spirosis 

due  to 
L.canlcola 

In 

slcunks 

as  determined 
by  Isolation 
from  urine 

HOF  3 

folnt  preva¬ 
lence  of 

lepto¬ 

spirosis 

due  to 
L*hard1o 

In  rac¬ 
coons 

as  determined 
by  serologic 
tests 

HOF  4 

Period 

prevalence 

of 

schisto¬ 

somiasis 

due  to 
S.mansonl 

In 

foxes 

as  determined 
by  isolation 
from  blood 

Using  only  the  LOFs  and  ilOFs 
listed  on  the  preceedlng  page, 
we  can  construct 
3x2x5x3x3*270  HOFx. 


^Specific  I»Fs 
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Definition;  A  specific  combination  of 


POP  (Poly-Order  Factor)  ... 


LOF3.  to  which  at  least  one  MOF  has 
contributed  more  than  one  LOF, 


Examples; 


Specific 

Kind  of  Epi¬ 
demiologic 
Index 

General 
Kind  of 
Disease 

Specific 

Disease 

Agent 

Animal 

Host 

Involved 

Method 

of 

Diagnosis 

POP  1 

Incidence  of 

lepto-  • 
splrosls 

due  to 

L.pomona 

and 

in 

foxes 

as  determined 
by  isolation 
from  blood 

L.canicola 


POP 

2 

Incidence  of 

lepto¬ 

spirosis 

due  to 
L.pomona, 
L.canicola, 
L.hardio, 
and  all 
other  L. 

in 

skunks 

as  determined 
by  isolation 
from  urine, 
isolation  from 
blood, and  sero¬ 
logic  tests 

POP 

3 

Point  prev- 

lepto- 

due  to 

in 

as  determined 

alence  of 

spirosis 

L.pomona, 

animals 

by  isolation 

and/or 

L.canicola, 

l.e. , 

from  urine. 

schisto-' 

L.hardio, 

raccoons, 

,  Isolation 

somiasls 

and  all  other 

skunks. 

from  blood. 

L. .  and/or  to 

foxes, and  and  sero- 

S.mansonl , 

all  other  logic  tests 

S. Japonlcum.and  animal 

all  other 

hosts 

Using  only  the  LOFs  and  HOFs  listed 
on  page  2A-1  we  can  construct 
7x3x27x7x7=  27,783  POFs 


POF/HOF/MOF/LOP  together  c«n  be  viewed  as  a  kind  of  hierarchy 
or  a  kind  of  matrix. 

Example; 


POF 

A  HOP  — >  lLOF 

LOF 

LOP 

■  LOF  71 

HOF 

LOP 

LOF 

LOF 

Ilof 

1 

MOP 

A  POP  — > 

LOF 

VfL 

LOF 

/*L^ 

’‘lof 

h 

LOP 

j 

LOP 

./ 

LOP 

/  1 

/  iLOF 

r 

1 

A  MOF 


With  regard  to  disease  data,  LOFs  and  IlOFs  In  general  cannot 
be  meaningfully  mapped  because,  by  themselves,  they  do  not  convey 
enough  Information.  However,  HOFs  and  POFs  cm  be  meaningfully 
mapped,  with  each  HOF  or  POF  yielding  one  map.  Sometimes  a  HOF  can 
consist  of  only  one  IK)F,  which  In  turn  can  consist  of  only  one  LOF. 
ThuS)  It  Is  possible  for  a  HOF/I‘!OF/LOF  structure  to  consist  of  a  single 
description  or  name.  This  situation  can  be  viewed  as  a  uni-* LOF 
unl-^K)F  HOF,  or  as  a  LOF  which  Is  also  a  i-iOF  which,  in  turn.  Is  also  a 
(mappable)  HOF. 

Example: 

ikiltl-LOF  HOF  1  -  Point  prevalence  of  leptospirosis  due 
to  L.  Pomona  In  foxes 

Uni-LOF  unl-I'IOF  HOF  1  -  Type  of  bedrock 

Uni-LOF  unl-WOF  HOF  2  -  Total  annual  rainfall 
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In  order  to  cap  HOFs  and  POFs,  CELs  and  FAVs  oust  be  coupled 
with  them. 

CEL  (Coomon  Elements)  .  .  ,  Definition;  The  set  of  items  which 

are  necessary  to  describe  every  bit 
of  data.  These  items  do  not  fall  into 
an  order  or  hierarchical  relationship 
and  are  therefore  different  from  the 
POF/HOF/riOF/LOF  structure.  The  five 
following  CEL  items  will  suffice  for 
illustration  (see  Appendix  3). 


Item 

Example  1 

1.  Geographic  location  by 

WO88O3I’ , 

N37°29' ;  1 

(>,  L)  or  (LO.L/O 

E179°01* , 

S17°09' .  1 

2.  Geographic  location  by  North  America,  U.S.A., 

political  unit  Illinois,  Pope  County; 

Europe,  France,  Dordogne, 

Les  Eyzies. 

3.  Time  period  for  which  1901-1910;  January  1961. 

the  datn  applies 

4.  Reliability  of  the  data  Here  reliable;  less  reliable. 

5.  Source  document  number  00087;  00243. 


FAV  (Factor-Value)  ...  Definition:  /ui  alphabetic  and/or 

numeric  symbol  expressing  ope  member 
of  the  set  of  all  possible  results, 
the  result-set  describing,  in  effect, 
the  functional  relationship  between  a 
specific  HOF  or  POF  and  a  specific  CEL, 
FAVs,  like  CELs,  do  not  fall  into  a 
hierarchical  type  of  order  or  relation¬ 
ship  and  are  therefore  different  from 
the  POF/HOF/rJUF/LOF  structure. 
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FAV  (Factor-Value)  continued 


s: 


Factor 


Value 


DATA  POINT 


Of  2,  3p  •••(  Q^; 

0|  0*01^  0*02^  0»07j  ***>00  i 
0-10,  10-20,  20-30, 

Absent,  present; 

Absent,  rare,  conmon,  abundant; 

Shale,  limestone,  sandstone,  granite, 
De^lnltl^:  A  general  term  Including 
PCFs,  HOFs,  liOFs,  and/or  LOFs;  l«e,, 

In  essence,  a  name  or  description  of 
some  aspect(s)  of  disease/environ¬ 
mental  sltuatlon(8). 

Definition;  A  general  term  Including 
FAVs  (and  some  other  Items  not  yet 
precisely  defined). 

Definition;  The  comhlnatlon  of  a 
specific  CEL,  a  specific  POF  or  HOF, 
and  a  specific  FAV;  In  essence,  a 
specific  geographic  locality  (CEL  - 
geographic  location)  where,  for  a  given 
time  point  or  interval  (CEL  -  time  frame), 
some  person(s)  (CEL  -  source  document) 
determined  (CEL  -  reliability)  (or 
observed  or  measured)  a  specific  value 
(FAV)  for  a  specific  factor  (HOF  or  POF), 
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I 


*Data  Point  n 


Data  Point  1 


Data  Point  2 


Data  Point  3 


Data  Point  4 


Examples; 

Geographic  location  by  ,  L)  /  Geographic 
location  by  political  unit  /  time  period  / 
reliability  /  source  document  //  factor 
(POP  or  HOF)  //  value  (FAV), 

W088°31*  N37®29'  /  North  America,  U.S.A., 

Illinois,  Pope  County,  Dixon  Springs  Experi¬ 
mental  Station  /  14  June  1962  /  more  reliable  data  / 
00087  //  point  prevalence  of  leptospirosis  due 
to  L.  pomona  In  foxes  as  determined  by  Isolation 
from  urine  (a  HOF)  //  «»  1/13. 

W091®36'  N40®33*  /  North  America,  U.S.A, , 

lova,  Burlington  County,  Keolcuk  /  13  January 
1966  -  27  July  1969  /  less  reliable  data  / 

00107  //  period  prevalence  of  leptospirosis 
due  to  pomona  and  L,  canlcola  and  L,  hard  jo 
In  raccoons  and  foxes  as  determined  by  isola¬ 
tion  from  urine  and  isolation  from  blood  and 
Isolation  from  tissue  (a  POF)  //  »  11/4337, 

W090O00'  N40°00'  /  North  America,  U.S.A. , 

h'issouri,  St.  Louis  County,  Crystal  City  / 

1951  /  reliability  unknown  /  17734  // 
qualitative  estimate  of  leptospirosis  due 
to  L,  pomona  and  L,  canlcola  in  sloinks  as 
determined  by  Isolation  from  urine, 

Isolation  from  blood.  Isolation  from  tissue, 
and  serologic  tests  (a  POF)  //  =  Rare. 

I!093®15'  N40°01*  /  North  America,  U.S.A,, 

Kansas,  Riley  County,  Manhattan  /  1962  / 
more  reliable  /  01072  //  lithologic  type 
of  bedrock  (a  unl-LOF  uni-MOF  HOF)  //  = 

Limestone  Interbedded  with  shale. 


j  *  All  data  points  follow  this  format, 

I 


i 


I 
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NAR  (Narrative) 


Definition;  Supporting  notunappable, 
narrative  or  textual  Information  or 
data  associated  with  a  specific 
data  point  -  Information  useful  to 
the  person  examining  the  map  In  order 
to  Increase  l^ls  understanding  of  the 
data  mapped. 


Thus,  a  complete  record  for  a  particular  data  point  would 
Include  these  Items: 


CEL  //  POF  or  HOF  (ilOFs  and  LOFs)  //  FAV  ///  NAR  (+) 
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The  specific  exsoplea  presented  schematically  as 
Map  and  liap  2  illustrate  aocoe  of  the  difficulties  in 
mapping  so  relatively  simple  a  factor  as  disease  (leptospirosis) 
incidence.  Note  that  "Incidence  of  leptospirosis"  and 
"Incidence  of  leptospirosis  due  to  all  leptospires"  must  both 
he  treated  as  a  logical  sum:  "Incidence  of  leptospirosis  due 
tm  L.  pomona  and  ^  canicola  and  (all  other  LOFs  specifying 
L«  speclesy?' 

In  relation  to  this  presentation,  it  is  appropriate  to 
define  map,  in  the  context  of  the  tDD  program.  A  map  is 
considered  to  be  a  graphic  representation  of  data  distributed 
meaningfully  in  relation  to  geographic  coordinates. 
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Map  1  -  Point  prevalence  of  lepto- 
spiroeis  due  to  L.  pomona  in  foxee 
as  determined  by  isolation  from 
urine  [a  HOF],  based  on  more  re¬ 
liable  data  [CEL  -  reliability]  for 
1960-1965  [CEL  -  time  period]  taken 
from  all  source  documents  (i.e.,  from 
documents  00001,  00002,  .,.)[CEL- 
source  document]. 


L  =  N39°33' 


02'- 

[CEL  -  Geograpi'ic 
Location  by  (X,  D] 

- -  X  »  W091®34'  X»W091°11' 

Note:  (This  information  can  be  displayed  as  dots,  a 

shadings,  j'^j  or  contours.  )  [<^] 


L  =  N39® 


Map  2  -  Qualitative  estimate  of 
leptospirosis  due  to  L.  pomona  and 
L.  canicola  and  L.  hardio  in  raccoons 
and  skunks  and  foxes  as  determined  by 
serologic  tests  [a  POP],  based  on 
more  reliable  and  less  reliable  data 
[CEL  -  reliability]  for  1950-1960 
[CEL  -  time]  taken  from  all  source 
documents  (i.e.,  from  documents 
00001,  00002,  ...)  [CEL  -  source 
document] . 


APPENDIX  3 


FACTOR  CATALOG;  A  List  of  the  Disease/ Environmental  Factors 
to  be  used  in  the  Mapping  of  Disease  (IDD)  Project. 


FACTOR  CATALOG.  Part  I;  Common  Elements  (CELs) 

CEL  -  Geographic  location  by  (  .  L)  or  (LO.  LA); 

Ex:  (WII8037’,  N22021'), 

(E089O15’,  SOOO37'),  etc. 

CEL  -  Geographic  location  by  (political  unit); 

Ex:  (NAmer.)  USAmer.,  Ind.,  lionroe  Co.,  Bloomington), 
(Afr.,  Ghana,  Accra),  etc. 

CEL  -  Geographic  location  by  (UTIl  military  grid  coordinates) 
Ex:  (37041973),  etc. 

CEL  -  Manner  of  reporting  data: 

Ex:  Data  reported  as  individual  cases 

Data  reported  grouped  for  city/ town/ village 
Data  reported  grouped  for  state/province 
Data  reported  grouped  for  country/ large  colony 

CEL  -  Security  classification  of  data: 

Ex:  Top  secret 
Secret 

Confidential 

Restricted  -  For  official  scientific  use  only 
Unclassified 

CEL  -  Time  period  for  which  data  applies: 

Ex:  1986, 

1983-1965, 

Mar  1964, 

Jan  1950-Jun  1962, 

17  Nov  58, 

13  Jan  63-21  Aug  64,.  etc. 

CEL  -  Reliability  of  the  data: 

Ex:  Highly  reliable 

Not  highly  reliable 
Undetermined 

CEL  -  Source  of  the  data: 

Ex:  00123  p  1097, 

00087  p  83-91,  etc. 


37^-1 
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FACTOR  CATALOG.  Part  II;  Dlseaae  Factors  (WFs/LOFs) 


iiOF  -  Itethod  of  Indicating  extent  of  disease  within  population; 

LOFs:  Occurrence 
Abundance 

Number  of  cases  existing  at  given  point  in  time 

Number  of  cases  beginning  during  given  time  Interval 

Number  of  cases  existing  anytime  within  given  time  interval 

Number  of  deaths  during  given  time  interval 

Point  prevalence 

Incidence 

Period  prevalence 

iiortallty 

MOF  -  General  kind  of  disease; 

LOFs;  Leptospirosis  (=l7ell*8  disease  *  7-day  fever  ®  etc.) 
Hemorrhagic  fever 
Dengue 

I'iPF  -  Specific  disease  agent  or  specific  disease  type; 

LOFs:  Leptospira  canlcola, 

Leptospira  pomona , 

Omsk  hemorrhagic  fever, 

Crimean  hemorrhagic  fever,  etc. 

liOF  -  Broad  category  of  primary  host  Infected: 

LOFs:  Human  beings 

Domesticated  mammals  or  birds 

Wild  mammals  or  birds 

Other  vertebrates 

Arthropods 

Other  Invertebrates 

Plants 

Protists 

MOF  -  Specific  primary  host  infected: 

LOFs ;  Homo  sapiens, 

female  Australoid  human  beings  20-30  years  old, 

raccoons, 

stink-pot  turtle. 

Chihuahua  dogs,  etc. 
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MOP  -  Droad  category  of  Intermediate  host  Infected; 

LOFs!  (Same  as  for  primary  host) 

VOF  ■"  Specific  Intermediate  host  Infected; 

LOFs;  (S^e  as  for  primary  host) 

MOF  -  Broad  category  of  reservoir  Infected; 

LOFs;  (Same  as  for  primary  host) 

hX)F  -  Specific  reservoir  Infected: 

LOFs:  (Same  as  for  primary  host) 

MOF  -  Broad  category  of  carrier  Infected: 

LOFs;  (Same  as  for  primary  host) 

MOF  -  Specific  carrier  infected: 

LOFs;  (Same  as  for  primary  host) 

HOF  -  Droad  category  of  vector  infected; 

LOFs;  (Same  as  for  primary  host) 

HOF  «■  Specific  vector  infected; 

LOFs;  (Same  as  for  primary  host) 

fiOF  -  Method  of  transmission  to  primary  host; 

LOFs;  Direct  contact  with  living  infected  animals 
Direct  contact  with  dead  tissue  or  blood 
Direct  contact  with  excreta 
Indirect  occupational  contact  with  water 

Indirect  recreational  contact  with  water 

Indirect  domestic  contact  with  water 
Indirect  occupational  contact  with  soil 

Indirect  recreational  contact  with  soil 

Indirect  domestic  contact  with  soil 
Bite  of  vector  or  carrier 

liOF  -  Epidemiologic  state  of  disease  within  prpulation; 

LOF s ;  endemic  or  enzootic 

hyperendemic  or  hyperenzootic 
sporadic 

epidemic  or  epizootic 
pandemic  or  panzootic 

HOF  -  Kind  of  outbreak  reported; 

LOF s ;  isolated  case;  one  case 

smaller  group  of  cases;  2-29  cases 
larger  group  of  cases;  30  or  more  cases 
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tflJP  -  Duration  of  outbreak  reported; 

LOFa;  10  days, 

7  weeks,  etc. 

MOF  -  Type  of  medical  facilities  Involved  In  treatments: 

LOFs;  Military  hospital  or  clinic, 

Unlverslty/academlc  hospital  or  clinic 
Large/urban  hospital  or  clinic 
Small/rural  hospital  or  clinic 
Individual  doctor 

Nurse,  paramedical  person,  e.g.,  aid  or  "dresser" 

Folk  or  witch  doctor 

None 

>iUF  -  Lethality  of  disease  In  outbreak  reported; 

LOFs;  Always  fatal 
Often  fatal 
Seldom  fatal 
Rarely  fatal 
Never  fatal 

^K)F  -  Average  severity  of  disease  In  outbreak  reported: 

LOFs;  Fatal 

Severe  clinical 
lioderate  clinical 
lilld  clinical 

Subcllnlcal  or  asymptomatic 

hK)F  -  Average  course  of  disease  In  outbreak  reported; 

LOFs;  Acute 

Subacute 

Subchronic 

Chronic 


MOF  -  Immunity  (relative)  of  hosts  Infected; 
LOFs;  Susceptible  or  not  immune 
Naturally  immune 
Artificially  Immunized 


I-iOF  -  Type  of  medical  facilities  involved  In  diagnosis; 

LOFs:  (Same  as  medical  facilities  involved  in  treatments) 


I'iOF  -  Method  of  diagnosis; 

LOFs;  Clinical  observation 
Isolation  of  organism 
Isolation  of  organism 
isolation  of  organism 
Isolation  of  organism 


from  water 
from  soil 
from  wine 
from  blood 
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Isolation  of  organism  from  other  body  fluid  or 
excretory  product 
Isolation  of  organism  from  tissue 
Serologic  tests 
Xerodlagnosls 
Biopsy 
Autopsy 

MOP  .  Type  of  sample  diagnosed; 

LOFs:  Randomly  selected  Individuals 

Individuals  selected  because  of  their  sickness/health 
Individuals  selected  because  of  their  occupation 
Individuals  selected  because  of  their  recreational  habits 
Individuals  selected  because  cf  their  domestic  habits 
Individuals  selected  because  of  other  characteristics 
(Including  grouping  based  on  social  structure,e.g. , 
family) 

M0F__2_Size_of_samgJLe_djLagnosed : 

LOFs:  100  Individuals;  etc. 

MOF  -  Type  of  subpopulatfon  sampled  for  diagnosis: 

LCFs :  Natives  examined  during  visit/expedition. 

Patients  treated  as  outpatients  by  clinic  or  hospital, 
Patients  with  suspected  leptospirosis. 

Sewer  workers. 

Military  draftees. 

College  or  University  students. 

Residents  of  odd-numbered  street  addresses. 

Live- trapped  animals. 

Dairy  herd,  etc. 

MOF  -  Size  of  subpopulation  sampled  tor  dlagnosl  : 

LOFs;  1,000  Individuals,  etc. 

liDF  -  Type  of  total  population  sampled  for  diagnosis: 

LOFs;  Urban  or  larger  city  (human  beings) 

Suburban  or  smaller  town 
Densely  settled  rural 
Sparsely  settled  rural 
Concentrated  animals  (humans) 

IDF  -  Size  of  total  population  sampled  for  diagnosis; 

LOF s ;  10,000  individuals,  etc. 
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FACTOR  CATALOG.  Part  III:  Environmental  Factors  (MOFs/LOFs) 


This  is  still  under  development.  It  will  include  MOFs/LOFs  dealing 
with  the  following  kinds  o£  factors*  Obviously,  emphasis  will  be 
on  those  factors  considered  to  be  most  pertinent. 

Soils:  Types 

Temperature 
tioisture  content 

Chemical-mineral  content  (Including  trace  elements) 

Bedrock:  Types 

Structure 

Chemical-mineral  content  (including  trace  elements) 
Topography,  relief,  elevation,  or  altitude 
Landforms 

Water:  Types  (soil,  surface,  ground) 

Temperature 

Chemical  analyses  (including  pH,  salinity) 

Pollution  or  sewage? 

Bvapotransplration,  pan  evaporation 

Climate  types 

Weather  types 

Temperature:  High,  low,  mean,  ranges,  etc. 

Monthly,  annual,  average  annual,  etc. 

Precipitation:  Total  amount  (monthly,  aimual,  average  annual,,  etc.) 
Seasonal  distribution 
Types  (rain,  snow,  etc.) 

Frequency  and  duration  of  dew  formation 


Humidity 

Clouds  and  fog,  clarity  or  transparency  of  atmosphere 

Illumination,  days  of  sunshine,  insolation 

Winds:  Dlro'.tlon 

Frequency 
Severity  or  force 


Barometric  pressure 


Atoosphf’rlc  pcllutlon 

Natural  disasters  (hurricane,  flood,  dust  storm,  drought,  etc.) 
Magnetism  (terrestrial) 

Lightning  (static  electricity) 

Solar  radiation 
Cosmic-ray  radiation 

Organisms  (pertinent)  occurring  In  sane  area  as  the  disease 
under  consideration,  e.g.,  names  of  taxonomic  groups, 
including  wild  and  domasticated  species,  and  including 
vertebrate  enlnuils.  Invertebrate  animals,  plants,  and  protista) 

Distributions  (geographic)  of  all  such  organisms,  and  their 
abundances 

Degree  of  concentration  versus  dispersal  of  animal  populations 
In  the  area 

Disease  or  health  conditions  of  such  organisms 

Special  attention  will  be  given  to  known  or  potential: 

Intermediate  hosts 

Reservoirs 

Vectors 

Accioential  hosts 

Artificial  or  experimental  hosts 

Insecticide  or  drug  resistance  among  such  organisms 

Local  habitats  (grassland,  swamp,  desert,  forest,  etc.) 

Blomes  (tropical  rain  forest,  temperate  forest,  northern 
coniferous  forest) 

Blogeographic  region 

Population: 

Total 

Density 

Settlement  patterns,  or  type  of  settled  area 
(large  alty,  small  city,  camp,  barracks, 
family  group,  etc.) 

Age  distribution  of  population  (or  average  year  of  birth) 
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Sex  distribution  of  population  (or  average  year  of  birth) 

Types  of  family  groupings 

Average  size  of  family  groupings 

Racial  groups  within  population 

Ethnic  or  nationality  groups  within  population 

Language  groups  within  population 

Socio-economic  (including  caste)  groups  within  population 
Blood-group  distribution 

Distribution  of  other  human  hereditary  or  genetic  factors 

lledlcal  facilities  available 

Type  (large  hospital,  small  hospital,  clinic, 
mobile  aid  staition,  etc.) 

Sponsorship  (government,  military,  missionary, 
industrial,  private,  e.tc») 

Ease  of  access  to  facilities 
(lumbers 

liedical  personnel  available 

Type  (doctors,  nurses,  veterinarians,  etc.) 

Numbers 

Treatment  of  water  supply 
Treatment  of  sewage 

Public  health  service  facilities  and  expenditures 

Other  pertinent  diseases-conditions,  e.g,,  drug  addiction, 
alcoholism,  malnutrition,  tuberculosis,  mental  disorders, 
etc.,  in  the  population 
At  present 
In  the  recent  past 

Average  person's  medical,  hygenlc,  and  sanitary  practices 
and  habits 

Average  person's  dietetic  and  nutritional  habits 

Average  person's  clothing  habits 

Average  person's  housing  preferences  and  habits 


Educational  level  of  population 
Literacy 

Number  of  college,  high  school,  and  elementary 
school  graduates 


Land  use 

Type  of  economy  (hunting- gathering,  farming,  machine 
civilization,  etc.) 

Basis  of  economy  (fishing,  forestry,  farming,  mining, 
manufacturing,  etc.) 

Occupations 

Types  present 

Relative  proportions  (l.e.,  predominant  jobs) 

Those  Involving  special  risk  of  exposure  to 
disease  In  which  Interested 

Economic  levels,  dlatrlbutlv>n  of  Income,  standard  of 
living  index 

Communications  available,  degree  to  which  used 

Transportation  available,  degree  to  which  used 
Types 

Average  mobility  of  population  (Including  migrations, 
travel  patterns,  troop  movements) 

Kinds  involving  special  risk  of  exposure  (such  as 
walking  through  jungle,  fording  streams,  etc.) 

Political  movements,  political  views 

liilitary  organization  of  population  (none,  militia,  away- 
from-home  active  duty,  etc.) 

Religions 

Superstitions 

Other  customs 

Artistic,  literary,  or  musical  customs  and  activities 

Recreational  and  entertainment  habits 
Kinds 

Frequency  Indulged  In 

Special-risk  types  (water  sports,  hiking  In  Jungle,  etc.) 
Crime  statistics 
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APPENDIX  4 


This  survey  wss  designed  to  evaluate  off-line  plotters  potentially 
useful  In  the  IIOD  Project*  The  plotters  described  here  have  one 
basic  cotoson  characteristic:  they  all  accept  magnetic  tape  as  a 
source  of  Input* 

The  maximum  speeds  given  for  contouring  are  those  specified  by 
the  respective  manufacturer,  and  the  accuracy/reproduclblllty 
figures  are  for  those  maximum  speeds. 

The  plotters  considered  here  are  listed  In  alphabetical  order, 
by  manufacturer* 

1.  The  BENSON-LEHNER  COIiPANY  produces  six  sophisticated 

plotter  systems.  Upon  request  by  a  Denson-Lehner  plotter 
owner,  one  may  obtain  their  contouring  programs,  and  these 
are  presently  being  evaluated  by  the  1K)D  Project  team* 

The  SIE-LTE  unit  has,  as  a  special  feature,  the  ability  to 
change  pens  (four)  by  program  command, 

A*  Benson- Lehner.  STE 

type:  flat  bed 
size:  30"  x  30" 

Input :  7-track,  200/500  bits  per  inch  (bpl)  magnetic  tape 

accuracy :  +  0,015" 

repeatability;  +  0,005" 

max,  speed;  2400  points  per  minute 

price;  $47,700 

options 

9-track,  800  bpl  Input  ($1000) 

48  character  alphanumeric  printer  ($4,500) 

B*  Benson-Lehner .  LTE 

type;  flat  bed 
size;  42"  x  58" 

input ;  7-track,  200/500  bpl  or  556/800  bpl 

accuracy;  +  ,015" 

repeatability;  +  *005" 

max,  speed:  4500  points  per  minute 

price;  $52,500 

options 

9-track,  800  bpl  Input  ($1000) 

48  character  alphanumeric  printer  ($4,500) 
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c 


Benson- Lehner,  MTD-IOS 

type;  drum 
sl^:  12" 

Input ;  7-track,  556/800  bpi  magnetic  tape 
max,  speed:  1,5"  per  second 
price;  $27,000 
options 

9- track  800  bpl  Input  ($2,000) 

D.  Benson- Lehner,  MTD-110 

type:  drum 
size;  12" 

Input ;  7-track,  556/800  bpl  magnetic  tape 
max,  speed:  3"  per  second 
price:  $27,000 
options 

9-track,  800  bpl  input  ($2,000) 

B«  Benson- Lehner .  ?rrD-305 

type;  drum 
size;  30" 

Input ;  7- track,  556/800  bpl  magnetic  tape 

max,  speed;  1,5"  per  second 
price;  $29,000 
options 

9-track,  800  bpi  Input  ($2,000) 

F«  Benson- Lehner.  ifrD-310 


type;  drum 
size;  30" 

Input ;  7- track  556/800  bpl  magnetic  tape 
max,  speed;  3''  per  second 
price;  $29,000 
options 

9-track,  800  bpl  Input  ($2,000) 

a  ii  ii  h  h  -k  H 

11  The  CALIFORNIA  C0^iPUTER  PRODUCTS,  INC.  (CALCCM>)  produces 
eight  plotters  and  five  control/tape  units.  These,  In 
combination  with  compatible  units,  result  in  a  total  of 
twenty  plotter  systems.  For  the  purposes  of  this  survey, 
the  plotters  and  associated  control  units  were  combined 
and  will  be  discussed  in  that  fashion.  The  first  number 
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will  b«  that  of  the  control  unit;  the  second  number  will 
be  that  of  the  plotter. 

All  Calcomp  plotter  systems  have  an  accuracy  and  repeat¬ 
ability  of  better  than  +  0.05  Inch. 

A.  Calcomp.  470-502 

type;  flat  bed 
size;  31"  x  34" 

Input :  7/9-track,  200/556/800  bpl  magnetic  tape 
max,  speed;  3"  per  second 
price;  $32,100 

B.  Calcomp.  470-518 

type;  flat  bed 
size ;  48"  x  72" 

Input;  7/9-track,  200/556/800  bpl  magnetic  tape 
max,  speed*  1.5"  per  second 
price;  $50,100 

C.  Calcomp,  470-583 

type;  drum 
size;  30" 

Input ;  7/9- track,  200/556/800  bpl  magnetic  tape 

max,  speed;  2"  per  second 
price;  $23,100 

D.  Calcomp,  470-565 

type;  drum 
size:  11" 

Input ;  7/9- track,  200/556/800  bpl  magnetic  tape 

max,  speed;  3"  per  second 
price:  $19,650 

E«  Calcomp.  750-502 

type;  flat  bed 
size;  31"  x  34" 

Input ;  7/9-track,  200/556/800  bpl  magnetic  tape 
max,  speed;  3"  per  second 
price;  $38,200 
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Calc 


750-518 


otnp, 

type:  flat  bed 
Bizet  48“  X  72’‘ 

Input:  7/y-track,  200/555/800  bpi  magnetic  tape 
max»  speed;  1,5"  per  second 
price:  $55,200 

G,  Calcomp,  750-563 

type:  drum 
size:  30" 

input:  7/9-track,  200/556/800  bpi  magnetic  tape 
max,  speed;  2"  per  second 
price;  $29,200 

H«  Calcomp.  750-565 

type ;  drum 
size;  11" 

input ;  7/9-track,  200/556/800  bpi  magnetic  tape 
max,  speed;  3"  per  second 
price;  $25,750 

I,  Calcomp.  750-502 

type;  flat  bed 
size;  31"  x  34" 

input ;  7/9-track,  200/555/800  bpi  magnetic  tape 
max,  speed;  3"  per  second 
price;  $45,500 

J,  Calcomp,  760-518 

type;  flat  bed 
size;  43"  x  72" 

input;  7/ 9- track,  200/556/800  bpi  magnetic  tape 
max,  speed;  1,5"  per  second 
price:  $53,500 

K,  Calcomp.  750-563 

type;  drum 
size ;  30" 

input ;  7/ 9- track,  200/556/800  bpi  magnetic  tape 

max,  speed:  2"  per  second 
price:  $35,500 
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L»  Calcomp,  76t-565 


type;  drum 
size;  11" 

Input:  7/9- track,  200/556/800  bpi  magnetic  tape 
max,  speed:  3"  per  second 
price:  $33,050 

M,  Calcomp.  770-702 

type;  flat  bed 
size:  31"  x  34" 

Input ;  7/9- track, 

max,  speed;  2,25*‘ 

price;  $60,500 

N,  Calcomp.  770-718 

type;  flat  bed 
size;  48*'  x  72" 

Input ;  7/ 9- track, 

max,  speed;  ,9"  per  second, (3, 4"  per  second  In  the  ZIP  mode) 
price:  $79,500 

O,  Calcomp.  770-753 

type;  drum 
slz^^;  30" 

Input ;  7/9-track,  200/555/800  bpl  magnetic  tape 

max,  speed;  3,5"  per  second  (13*'  per  second  In  the  ZIP  mode) 

prl<-c:  $51,500 

F,  Calcomp.  770-755 

type;  drum 
size;  11" 

Input :  7/9-track,  200/556/800  bpl  magnetic  tape 

max,  speed;  4,5’'  per  second  (15,9"  per  second  In  ZIP  mode) 

price;  $47,500 

Q,  Calcomp.  7G0-702 

type;  flat  bed 
size ;  31"  x  34" 

input ;  7/S- track,  200/555/800  bpl  magnetic  tape 

max,  speed;  2,25*'  per  second  (8.5*  per  second  In  ZIP  mode) 

price;  $65,500 


200/556/800  bpl  magnetic  tape 
per  second  (8.5"  per  second  In 
the  ZIP  mode) 


200/555/800  bpl  magnetic  tape 
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Ri _ Cj 


type;  flat  bed 
size;  48"  x  72" 

Input ;  7/9-track,  200/556/300  bpl  magnetic  tape 

max*  speed;  ,9'’  per  second, (3.4"  per  second  in  ZIP  node) 

price;  $34,500 

S«  Calcotnp.  780-763 

type;  drum 
size;  30" 

Input ;  7/9-track,  200/556/800  bpl  magnetic  tape 

max,  speed;  3.5"  per  second  (13**  per  second  In  ZIP  mode) 

price;  $53,500 

T.  Calcomp.  780-765 

type ;  drum 
size;  11" 

Input ;  7/ 9- track,  200/556/800  bpl  magnetic  tape 

max,  speed;  4.5"  per  second  (16.9"  per  second  in  ZIP  mode) 

price;  $52,500 

****** 

III  The  ELECTRONIC  ASSOCUTES,  INC.  (EAI),  produces  one  plotter 
system  which  will  accept  a  magnetic  tape  input. 

A.  EAI.  3500 


type- 

size; 


flat  bed 

_  30*’  X  30" 

input;  7-track,  200/556  bpi  magnetic  tape 

max,  speed;  no  speed  given  for  a  magnetic  tape  setup 

price;  $46,750 


IV  The  GEREER  SCIENTIFIC  INSTRUiiENT  CO.  produces  a  number  of 
plotters  that  come  within  the  basic  requirements  of  this 
study.  The  first  one  or  two  digits  (from  the  left),  a 
Gerber's  reference  number,  is  the  control  unit  designation, 
whereas  the  two  right-most  digits  specify  the  particular  plotter. 
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A«  Gerber,  622 


type;  flat  bed 
size:  50"  x  60" 

Input ;  7-track  200/556/800  bpi  magnetic  tape 

mex«  speed;  10"  per  second 

accuracy ;  +  .009 

repeatability;  +  .0045 

price;  $50,000 

options 

72  character  print  wheel  ($4,800) 

B.  Gerber.  632 

type;  flat  bed 
size;  4'  x  5* 

Input ;  7-traclc,  200/556/800  bpl  magnetic  tape 

max,  speed;  33"  per  second 

accuracy ;  +  .0025 

repeatability ;  v  .0013 

price;  $55,000 

options 

72  character  print  wheel  ($4,800) 

C.  Gerber.  675 

type;  flat  bed 
size;  5'  x  8' 

Input;  7-track,  200/555/800  bpl  magnetic  tape 

max,  speed;  10"  per  second 

accuracy ;  0.009 

repeatability;  +  0.005 

price;  $68,000 

options 

a)  5’  X  12'  plotter  size  ($2,000) 

b)  5'  X  20*  plotter  size  ($4,000) 

c)  72  character  print  wheel  ($4,800) 

D.  Gerber.  822 

type:  flat  bed 
size:  50"  x  oO" 

Input ;  7-track,  200/556/800  bpl  magnetic  tape 

max,  speed;  10"  per  second 

accuracy ;  jh  0,009 

repeatability:  v  0,0045 

price;  $83,000 

options 

72  character  print  wheel  ($4,800) 
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E»  Gerber.  832 

type:  flat  bed 
size;  4'  x  5' 

Input ;  7-track,  200/556/800  bpl  magnetic  tape 

max,  speed;  3.75*'  per  second 

accuracy ;  ^  0,0025 

repeatability ;  v  0,0015 

price;  $91,000 

option 

72  character  print  wheel  ($4,800) 

F,  Gerber.  875 

type:  flat  bed 
size:  5*  x  3' 

input ;  7-track,  200/556/800  bpi  magnetic  tape 

max,  speed;  10"  per  second 

accuracy ;  0,009 

repeatability ;  +  0,005 

price !  $104,000 

options  (price) 

a)  5’  X  12'  plotter  size  ($2,000) 

b)  5'  X  20'  plotter  size  ($4,000) 

c)  72  character  print  wheel  ($4,800) 

G«  Gerber.  1022 

type ;  flat  bod 
sir.n:  50"  x  60" 

input:  7-track,  200/556/800  bpl  magnetic  tape 

max,  speed :  lU"  per  second 

accuracy ;  +  0.009 

repeatablltty :  +  0.0045 

^ice;  $109.  OOO 

option  (price) 

72  character  print  wheel  ($4,300) 

H«  Gerber.  1032 

type;  flat  bed 
size;  4'  x  5' 

input ;  7- track,  200/556/000  bpl  magnetic  tape 

max,  speed;  3,75"  per  second 

accuracy:  +  0.0025 

repeatability :  +  0.0013 

price;  $114,000 

option  (price) 

72  character  print  wheel  ($4,000) 
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I,  Gerber.  1075 


type;  flat  bei 
size;  5'  x  8' 

Input ;  7-track,  200/556/800  bpl  magnetic  tape 

max,  speed;  10"  per  second 

accuracy ;  +  0.009 

repeatability;  +  0.005 

price;  $127,000 

options  (price) 

a)  5*  X  12*  plotter  size  ($2,000) 

b)  5*  X  20*  plotter  size  ($4,000) 

c)  72  character  print  wheel  ($4,800) 

J«  Gerber.  2022 

type;  flat  bed 
size;  50"  x  60" 

input;  7-track,  200/556/000  bpl  magnetic  tape 

max,  speed;  10"  per  second 

accuracy ;  +  0.009 

repeatability;  +  0.0045 

price;  $110,000~ 

option  (price) 

72  character  print  wheel  ($4,800) 

K.  Gerber.  2032 

type;  flat  bed 
size;  4*  X  5* 

input ;  7-track,  200/556/000  bpi  magnetic  tape 

max,  speed;  3.75''  per  second 

accuracy;  +  0.0025 

repeatability;  0,0013 

price:  $123, OOO"* 

option  (price) 

72  character  print  wheel  ($4,300) 

L.  Gerber.  2075 

type;  flat  bed 
size ;  5'  x  G* 

input ;  7- track,  200/556/300  bpi  magnetic  tape 

max,  speed;  12,5"  per  second 

accuracy ;  v  O.OOS 

repeatability :  0,005 

price;  $136,000" 

options  (price) 

a)  5’  X  12’  plotter  size  ($2,000) 

b)  5'  X  20*  plotter  size  ($4,000) 

12  character  print  x/heel  ($4,800) 
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Conclusions; 


Since  the  accuracy  and  repeatability  of  all  the  plotters 
described  herein  meet  our  requirements,  these  factors  will 
not  limit  our  selection.  Neither  is  speed  a  limiting 
factor  among  the  plotters  described  above.  However, 
plotter  size  is.  We  will  requlre*a  plotter  area  of 
30''  X  60"  or  greater.  The  Benson-Lehner  LTE,  lirD-305^ 
and  hTD-310,  all  of  the  C^lcomp  control  units  with  a 
51C,  553,  713,  or  753  plotter,  and  all  of  the  Gerber 
plotters  meet  this  size-requirement. 

If  the  plotter  is  to  be  used  with  the  projected  /JIP 
IBii  350-iiodel  30  computer,  it  must  be  compatible  with  that 
computer.  With  this  further  limitation,  acceptable  plotters 
are  only  those  which  can  utilize  a  9- track  magnetic  tape 
as  input.  Only  the  Calcomp  series  mentioned  above  meets 
this  last  requirement  along  with  the  other  requirements. 
However,  were  an  Intermediate  tape-copy  device-process 
to  be  added  to  the  system,  this  would  permit  use  of 
several  of  the  other  plotters. 


*  strongly  desire 


